Closed
Description
I have a Wikitionary Zim file from December 2020, which I downloaded using the GUI kiwix-desktop interface (2020-12-10; "Pictures, Fulltext index"; 5.65 GB).
This works great for me but I'm not sure how to figure out which Wiktionary it is based on.
It lacks changes to Wiktionary made in August 2020, although it contains changes from May 2020.
Where can I find out which Wiktionary dump a Zim file is based on, and how do I find a Zim file which is based on a current version of Wiktionary?
(And where should I submit this issue?)
Activity
kelson42 commentedon Feb 7, 2021
Are you talking about Wiktionary in English? Which content exactly is missing (two screenshots would be helpful)?
archenemies commentedon Feb 7, 2021
Yes English.
Here is an example of a diff from August which is missing from the December 2020 Kiwix Wiktionary Zim file. I just picked it at random, so far the December Zim file seems to be missing everything since around June or so.
https://en.wiktionary.org/w/index.php?title=rocker&diff=prev&oldid=60027083
Someone added a sense to "rocker", number 4 here:
Here's the Kiwix screenshot where you can see that it's missing:
I guess the answer to my other question is that there is no reason for the Zim file to be out of date then? Certainly as a software developer I would expect the Zim file to have embedded in it a date corresponding to when it was compiled, so that this kind of ad-hoc testing would not be necessary. Or does it get updated one word at a time, so different dictionary entries are out of date by different amounts? But in that case I would expect each entry to come with a timestamp...
kelson42 commentedon Feb 7, 2021
@archenemies I will have a look (and move the ticket), but looks like a problem with a root cause in Wikimedia infrastructure.
kelson42 commentedon Feb 7, 2021
@archenemies BTW, revision id, like revision date are available in the upstream link in the foorter of each article.
archenemies commentedon Feb 7, 2021
That's interesting about the upstream link in the footer, well "rocker" has the wrong link
https://en.wiktionary.org/wiki/?title=rocker&oldid=61038509
because it points to a revision from 4 November 2020 with the "breve below" sense #4 filled in, but the page that Kiwix serves me lacks that sense.
kelson42 commentedon Feb 10, 2021
It looks like to be a bug in the Wikimedia REST API because it simply does not deliver the latest version (like you reported). See: https://en.wiktionary.org/api/rest_v1/page/mobile-sections/rocker. This is the root of the bug.
On the
mwoffliner
side, there is a weakness which is that we don't request a specific revisionid, but just take the latest. If we would retrieve https://en.wiktionary.org/api/rest_v1/page/mobile-sections/rocker/61774146, then we would have get the proper content.I will do the necessary on both sides to improve the situation.
kelson42 commentedon Feb 10, 2021
A bug ticket has been open upstream at https://phabricator.wikimedia.org/T274359
kelson42 commentedon Feb 10, 2021
@MananJethwani Here again this is "complicated" to change due to the architecture.
70 remaining items