Wikifunctions:Project chat
Welcome to the Project chat, a place to discuss any and all aspects of Wikifunctions: the project itself, policy and proposals, individual data items, technical issues, etc.
Other places to find help:
- Wikifunctions:Administrators' noticeboard
- Wikifunctions:Report a technical problem
- Wikifunctions:FAQ
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 1 day and sections whose most recent comment is older than 30 days. |
edit |
Archives |
---|
Removal of the general-ban for ClaudeBot
The result of phab:T374318 was the ban of ClaudeBot via MediaWiki:Robots.txt. I'm fairly certain that ClaudeBot was not the issue here, and we don't ban unless needed to prevent disruption. Therefore, I'd like to remove the ban of ClaudeBot, and am seeking opinions on doing so. @Jdforrester (WMF): courtesy ping. Thanks! Feeglgeef (talk) 03:36, 5 January 2025 (UTC)
- I'd want to see the team's response: it's not implausible that ClaudeBot is worse (100,000 per week is 9 per minute, or about once per 6 seconds). —Mdaniels5757 (talk • contribs) 04:04, 5 January 2025 (UTC)
- I'm fine with waiting for the team's response. I'll raise the issue at the Volunteer's corner and if nobody objects there or until then I'll remove it. Feeglgeef (talk) 04:28, 5 January 2025 (UTC)
- Also, not very relevant but can we also remove the sitenotice? It's very annoying. Feeglgeef (talk) 04:36, 5 January 2025 (UTC)
- You can "[hide]" it Nemoralis (talk) 17:35, 5 January 2025 (UTC)
- Fine for me, if James agrees with it --Ameisenigel (talk) 19:11, 5 January 2025 (UTC)
- @Feeglgeef: "I'm fairly certain that ClaudeBot was not the issue here". That's fascinating. I watched as ClaudeBot took down production in front of my eyes. How are you so certain? Jdforrester (WMF) (talk) 17:00, 6 January 2025 (UTC)
- Because of my
waste of resourcesfeatured tool. Feeglgeef (talk) 19:34, 6 January 2025 (UTC)
- Because of my
Naming conventions recommendations
As we get closer to integrate Wikifunctions with Wikipedia, the Abstract Wikipedia team wants to share a few suggestions to ensure that Wikifunctions content is accessible, inclusive, and cohesive across projects. You can read more on this page Wikifunctions:Design/Naming conventions recommendations. Feel free to share your thoughts and ideas! AAlhazwani (WMF) (talk) 16:52, 10 January 2025 (UTC)
Data type issue
I've tried to fix the Python implementation of the existing function English verb (Z21890) by fixing inconsistent indentations and declaring the undeclared local variable "c". However, given the lack of familiarity on how Wikifunctions uses Wikidata lexemes, I'm not sure of the actual data type of "c", so I've declared it as a string for now. The test output now gives "quickly" instead of the expected "quickly ran". Sbb1413 (talk) 10:21, 30 January 2025 (UTC)
- Done Feeglgeef (talk) 13:58, 30 January 2025 (UTC)
- And thanks for cleaning up the implementation. Feeglgeef (talk) 14:00, 30 January 2025 (UTC)
Wikifunctions & Abstract Wikipedia Newsletter #187 is out: With 2000 Functions into the new year: time for stats
There is a new update for Abstract Wikipedia and Wikifunctions. Please, come and read it!
In this issue, we present some statistics about where we are as a project, we give some updates about our Types, and we take a look at the latest software developments.
Want to catch up with the previous updates? Check our archive!
Also, we remind you that if you have questions or ideas to discuss, the next Volunteers' Corner will be held on February 3, at 18:30 UTC (link to the meeting).
Enjoy the reading! -- User:Sannita (WMF) (talk) 14:05, 30 January 2025 (UTC)
Natural language functions
Hello everyone,
We’re working on functions that return natural-language outputs, and we should think about how to handle both these outputs and their inputs consistently. For example:
- How should we represent inputs like Wikidata’s “grammatical features” (gender, number, etc.) across different languages?
- How do we decide whether a function’s output should be a simple string or monolingual text?
- How can we create functions that work across languages and sentence structures, or that can be combined with other functions to do so?
What are your thoughts? Any suggestions or examples? GrounderUK (talk) 13:55, 1 February 2025 (UTC)
- Thanks. This is important questions. I don't have a good answer for now but I did see some strange and inconsistent things, we need some clarity. For example, "[gender] is a [country] [professional]", English Lexemes (Z21765) using the datatype Sign (Z16659) in input for gender (it works in this case but it still feels wrong, and it won't work in most languages and grammatical features), there is also Conjugate regular -er verb (Z21617) using arbitrary Natural number (Z13518) (see the two proposals WF:FRENCHSUBJ and WF:FRENCHTENSE by MolecularPilot).
- My two cents to help move forward:
- grammatical features can be strange (natural languages are full of exception and unexpected things, like in Breton prepositions are conjugated like verbs), whatever we choose need to be flexible enough.
- At the same time, I guess we don't want to recreate a list/datatype for each languages as most behave similarly and it would be redundant (6000+ languages with most of the non-genderless one having masculine/feminine, see https://wals.info/feature/30A for instance).
- Some of these functions will rely on Wikidata so Wikifunctions should understand and accept grammatical features used on Lexemes (and there is a lot, 989 right now : https://qlever.cs.uni-freiburg.de/wikidata/7n3eYj ).
- Right now, it seems most natural language functions return a simple string (String (Z6)), it's not wrong but a monolingual text (Monolingual text (Z11)) would be cleaner and clearer (removing the language tag afterwards is super easy - we already have string of monolingual text (Z14396) -, adding it could be trickier: when the label of a function says "English" is is American English, British English, both, none? a monolingual text would be explicit).
- grammatical features can be strange (natural languages are full of exception and unexpected things, like in Breton prepositions are conjugated like verbs), whatever we choose need to be flexible enough.
- Cheers, VIGNERON (talk) 14:27, 1 February 2025 (UTC)
- The 989 grammatical features are probably all useful, but the first few I looked at (e.g. "singular") would just be enumeration values for a type (in this case "grammatical number", of which there are currently 24 possible values). I think we should have a single type for grammatical number, and some languages will hardly use any values, but every language would find what they need. Similar for other categories of grammatical features. At present, this would result in long dropdowns (e.g. 24 items of which I might only know what two of them are), but if sorted reasonably well, I think that would be fine. 99of9 (talk) 10:55, 2 February 2025 (UTC)
- If we are ramping up the use of monolingual texts (I'm not opposed, when functions are returning sentences or phrases directly for a language), then we should start building quite a few more monolingual text helper functions (e.g. join texts - even the simple ones haven't been written). I'm a little bit concerned that we'll need separate functions to generate each of the monolingual texts for each of the English variants. It would be good to call a single one and share results whenever possible. 99of9 (talk) 11:03, 2 February 2025 (UTC)
- As I suggested on Telegram, I think the “gender” input for "[gender] is a [country] [professional]", English Lexemes (Z21765) is best understood as a placeholder for a noun phrase. The function supplies the copula and sentence complement for an indeterminate person who is the grammatical subject. This is an English language (family) function that assumes a third-person (semantically) singular subject (a living human being who is currently active in some profession or role). For English, such a context is not sufficient to determine the required placeholder (a pronoun), because third-person (semantically) singular pronouns are marked for “gender”. This additional context is therefore required as input. The use of Sign (Z16659) here is unfortunate, but we do not have a general-purpose Type to represent “one of three options” (and I’m not suggesting we should). A similar solution was not available for Conjugate regular -er verb (Z21617), hence the use of arbitrary natural numbers (and I’m not suggesting we should do that, either).
- Although I don’t object to "[gender] is a [country] [professional]", English Lexemes (Z21765), I don’t believe it provides a useful pattern for future functions. To be useful in a Wikipedia context, the end result (like “they are an American actor”) would need to change when the person gives up acting or dies, or changes their pronoun preference or nationality. Of course, we could have a separate function to handle the past tense or whatever and rely on prior functions to call a different function when the context changes, but I don’t think that would be sensible. In a more multilingual context, we would presumably characterise the context in a language-neutral way but expect (more) language-specific functions to determine the form of the copula (if any), and the required forms of any article, adjective or noun (or, indeed, a different sentence structure altogether). None of this is straightforward but it is characterising the context that poses a particular challenge, as more languages are considered.
- The “grammatical features” for a lexeme form on Wikidata suggest a way forward, since they can account for the variety of forms that are available. In effect, a particular function will produce sensible results for some subset of all supported contexts. However, we need to be able to handle the normal cases where the available context provides information that is unnecessary for the function, as well as the cases where the function supports distinctions that the context does not. This suggests the need for some intermediate interface function(s) that can reduce or extend the available context according to the expectations of the function being called. For example, if the mood is not available in context when calling a French conjugation function, it would default to the indicative mood. This implies that the user interface for such a function would support the provision of the context as an input object, presumably (at its most basic) as a list of grammatical features. How we could restrict such a list to values for relevant grammatical features is an open question (see, for example, phab:T379338 and Wikifunctions talk:Representing identity#Functionally constrained lists). GrounderUK (talk) 12:06, 2 February 2025 (UTC)
- Using grammatical features in this way has now been prototyped at Breton verb form (Z22097). This calls one of these existing functions based on a supplied list of grammatical features. It has two implementations but these are set up to call only a few functions while we evaluate this approach. GrounderUK (talk) 21:20, 2 February 2025 (UTC)
- I’ve also created grammatical features list from Wikidata items (Z22107) to demonstrate the expansion of composite items like first-person singular (Q51929218) into its basic components. It currently recognises only three such items and passes any other grammatical features through unaltered.
- After discussions with @99of9 and @Feeglgeef on Telegram, I created an implementation of Breton verb form (Z22097) that uses N-ifs (Z19601). This allows a flatter conditional structure similar to a case construct, which is easier to work with but doesn’t scale well to support a large number of function calls. This implementation currently supports calls to seven of the Breton conjugation functions. GrounderUK (talk) 13:25, 3 February 2025 (UTC)
- No opinion
- I'd actually quite like to use monolingual texts for ones that we don't intend to use on Wikipedia
- Generally I think we should try to have the same input types, even if that means a lot of redundant inputs.
- Feeglgeef (talk) 16:50, 1 February 2025 (UTC)
- I like it if outputs are simple strings. For me as I usually try to not care about types while programming and give the decision to language interpreter this seems to be easiest thing. As different people implement functions it will be not possible to be completely consistent here. I prefer referring to objects. For using functions across languages I need to think about how far it is possible. I will write something about it maybe in the next days. Hogü-456 (talk) 22:58, 1 February 2025 (UTC)
- Thanks.
- Do you have any concerns about the use of grammatical features?
- Same here, although my comment seems to have been overlooked (except by you).
- The problem I have with redundant inputs is that they are liable to be inconsistent with the grammatical features that are actually present on Wikidata. The approach I’ve adopted so far with Breton verb form (Z22097) and grammatical features list from Wikidata items (Z22107) is tolerant of redundancy and intolerant of deficiencies, but that is more “line of least resistance” than a firm conviction.
- GrounderUK (talk) 14:41, 3 February 2025 (UTC)
- Since Return monolingual text from grammatical features (Z19530) takes a list of Wikidata item reference (Z6091), I'd expect other functions to use that too (though I'm not sure how you'd include number). For languages with only a few cases, there could be persistent (named) lists for each as a shorthand. YoshiRulz (talk) 06:41, 2 February 2025 (UTC)
- Please see, for example, present indicative of “labour” 1st singular (Z22098) specifying grammatical number (Q104083) as singular (Q110786). We might also consider using expansions of items like first-person singular (Q51929218), as suggested by User:VIGNERON on Talk:Z22097 GrounderUK (talk) 22:58, 2 February 2025 (UTC)
- My preference would be to introduce precise enumerations for grammatical features. For example, we would have one enumeration for grammatical genders for languages that have feminine and masculine genders (e.g. for Spanish and French), and one for languages that have three grammatical genders such as German, and so on. Then there are individual enumerations for grammatical numbers: there's one for languages with singular and plural, one for singular, dual, and plural, etc. And each language-part of speech would only use the relevant enumerations.
- This means creating quite a few enumerations, but I think that's OK.
- Furthermore I think we should have individual types for each pair of language and part of speech, i.e. a type for English noun, a type for Breton verb, a type for Hausa verb, a type for Ukrainian adjective, etc. And each of these would be using the right enumerations as created above.
- I know that it is a bit of work, but in the end it allows the user experience to provide much more guidance.
- I think this is a really important discussion, and it would be good to get this right! --Denny (talk) 15:25, 3 February 2025 (UTC)
- I would be happy to create a few grammatical gender enumerations for now, e.g. one for feminine / masculine and one for feminine / masculine / neuter, and maybe a few more, depending on the languages people would like to work on. Creating enumerations is not that much work, and I agree that it would be good to get rid of using sign to represent gender rather sooner than later. --Denny (talk) 15:28, 3 February 2025 (UTC)
More detailed request
Moved from Administrators' noticeboard
Hi, I just noticed that the template here doesn't have any prominent information indicating exactly what permission is requested, which makes the sub-pages of the archive page here (all requests together) a bit less clear.
Maybe a line like this one used on mediawiki wiki will be helpful. --Mohanad (talk) 10:37, 3 February 2025 (UTC)