⚓ T365284 Linter error location offsets point to incorrect location when multi-byte characters are present
Page MenuHomePhabricator

Linter error location offsets point to incorrect location when multi-byte characters are present
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:
The interface that points to a specific location of a Linter error, for example any of the "edit" links at https://en.wikipedia.org/wiki/Special:LintErrors/stripped-tag?namespace=10 is supposed to jump the cursor in the editing window to the exact location of the Linter error, with the erroneous span or block of text or tag highlighted.

Instead, starting in October 2023 (see the first discussion linked above), the offsets started to be incorrect, overshooting the actual location of the error.

The problem appears to be present when multi-byte characters (or something like that; I don't know the technical details) are present. The location indexes do not appear to account for the size of those characters.

What should have happened instead?:
Any "edit" link on a Special:LintErrors page should jump to the exact location of the Linter error, as it did from 2018 until October 2023.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

ABreault-WMF subscribed.

Right, it looks like this was broken by https://github.com/wikimedia/mediawiki/commit/c8d0470f4b1055dfd5b1551e79a8a2f5fe994afc where offsetType stopped being passed to the environment.

The default for the lint path should be ( $opts['format'] === ParsoidFormatHelper::FORMAT_LINT ? 'ucs2' : 'byte' )
https://github.com/wikimedia/mediawiki/blob/master/includes/Rest/Handler/ParsoidHandler.php#L245-L246
but it's now falling back to Parsoid's default, byte.

Parsoid should still be passing the usc2 offsets to the Linter extension though,
https://github.com/wikimedia/mediawiki-services-parsoid/blob/master/src/Logger/LintLogger.php#L97-L101
and it seems like that's confirmed in the link above,

I've created a minimal reproducible example in my sandbox, see en:Special:Permalink/1181833694. It seems to be caused by non-ascii characters. It looks like the edit link from Special:LintErrors does go to the correct place, so I guess this is an actual bug in LintHint?

https://de.wikipedia.org/wiki/Benutzer_Diskussion:PerfektesChaos/js/lintHint#c-Rchard2scout-20231025141800-Bruce1ee-20231021063600

So, a problem for lintHint and any other client hitting the lint endpoints.

Change #1037871 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Set ucs2 offsetType on lint paths

https://gerrit.wikimedia.org/r/1037871

ABreault-WMF moved this task from Backlog to Code Review on the Content-Transform-Team-WIP board.

Change #1037871 merged by jenkins-bot:

[mediawiki/core@master] Don't ignore offsetType attribute on lint API paths

https://gerrit.wikimedia.org/r/1037871