Wiktionary/Wikivoyage zim databases lag website by five months · Issue #1397 · openzim/mwoffliner · GitHub
Skip to content

Wiktionary/Wikivoyage zim databases lag website by five months #1397

Closed
@archenemies

Description

@archenemies

I have a Wikitionary Zim file from December 2020, which I downloaded using the GUI kiwix-desktop interface (2020-12-10; "Pictures, Fulltext index"; 5.65 GB).

This works great for me but I'm not sure how to figure out which Wiktionary it is based on.

It lacks changes to Wiktionary made in August 2020, although it contains changes from May 2020.

Where can I find out which Wiktionary dump a Zim file is based on, and how do I find a Zim file which is based on a current version of Wiktionary?

(And where should I submit this issue?)

Activity

self-assigned this
on Feb 7, 2021
kelson42

kelson42 commented on Feb 7, 2021

@kelson42
Collaborator

Are you talking about Wiktionary in English? Which content exactly is missing (two screenshots would be helpful)?

archenemies

archenemies commented on Feb 7, 2021

@archenemies
Author

Yes English.

Here is an example of a diff from August which is missing from the December 2020 Kiwix Wiktionary Zim file. I just picked it at random, so far the December Zim file seems to be missing everything since around June or so.

https://en.wiktionary.org/w/index.php?title=rocker&diff=prev&oldid=60027083

Someone added a sense to "rocker", number 4 here:

screenshot-2021-02-06_20 12 34

Here's the Kiwix screenshot where you can see that it's missing:

screenshot-2021-02-06_20 12 47

I guess the answer to my other question is that there is no reason for the Zim file to be out of date then? Certainly as a software developer I would expect the Zim file to have embedded in it a date corresponding to when it was compiled, so that this kind of ad-hoc testing would not be necessary. Or does it get updated one word at a time, so different dictionary entries are out of date by different amounts? But in that case I would expect each entry to come with a timestamp...

kelson42

kelson42 commented on Feb 7, 2021

@kelson42
Collaborator

@archenemies I will have a look (and move the ticket), but looks like a problem with a root cause in Wikimedia infrastructure.

added this to the 1.12 milestone on Feb 7, 2021
kelson42

kelson42 commented on Feb 7, 2021

@kelson42
Collaborator

@archenemies BTW, revision id, like revision date are available in the upstream link in the foorter of each article.

archenemies

archenemies commented on Feb 7, 2021

@archenemies
Author

That's interesting about the upstream link in the footer, well "rocker" has the wrong link

https://en.wiktionary.org/wiki/?title=rocker&oldid=61038509

because it points to a revision from 4 November 2020 with the "breve below" sense #4 filled in, but the page that Kiwix serves me lacks that sense.

kelson42

kelson42 commented on Feb 10, 2021

@kelson42
Collaborator

It looks like to be a bug in the Wikimedia REST API because it simply does not deliver the latest version (like you reported). See: https://en.wiktionary.org/api/rest_v1/page/mobile-sections/rocker. This is the root of the bug.

On the mwoffliner side, there is a weakness which is that we don't request a specific revisionid, but just take the latest. If we would retrieve https://en.wiktionary.org/api/rest_v1/page/mobile-sections/rocker/61774146, then we would have get the proper content.

I will do the necessary on both sides to improve the situation.

kelson42

kelson42 commented on Feb 10, 2021

@kelson42
Collaborator

A bug ticket has been open upstream at https://phabricator.wikimedia.org/T274359

kelson42

kelson42 commented on Feb 10, 2021

@kelson42
Collaborator

@MananJethwani Here again this is "complicated" to change due to the architecture.

70 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

upstreamwikimediaDirect impact on Wikimedia content scraping

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Wiktionary/Wikivoyage zim databases lag website by five months · Issue #1397 · openzim/mwoffliner