[css-pseudo-4] Fine-tuning ::first-letter punctuation pattern matching · Issue #5830 · w3c/csswg-drafts · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-pseudo-4] Fine-tuning ::first-letter punctuation pattern matching #5830

Closed
fantasai opened this issue Dec 31, 2020 · 2 comments
Closed
Labels
Closed Accepted by CSSWG Resolution css-pseudo-4 Current Work i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Needs Testcase (WPT)

Comments

@fantasai
Copy link
Collaborator

fantasai commented Dec 31, 2020

In #5154 we updated the spec to include "intervening" white space. Certainly for spaces before the first letter this is correct, but the example of “A "b" will have A " selected” in @csnardi’s #2164 (comment) shows a real problem with this approach: since we reach out to following punctuation, allowing intervening white space after the first letter means we'll jump a word space and scoop up opening punctuation on the other side. :( I think we definitely need to make this smarter.

A few rules we could adopt that could help:

  • Break on normal word spaces and nbsp, at least on the following side of the first letter. If people want to scoop up subsequent punctuation after a space, they'll have to use typographically correct space codepoints such as thinsp.
  • Exclude opening punctuation following the first letter. In writing systems without a word space, we shouldn't be picking up the opening parens after the first "letter"! Po/Pi/Pf are ambiguous, but Ps is not.
  • If there's an element boundary after the first letter, require that the UA close ::first-letter before that boundary, excluding the content after that element boundary, rather than allowing them to create ::first-letter after (excluding the first letter itself!) or both before and after the element boundary as we do currently.

I think we also need to do some more thinking about following Po/Pi/Pf/Pd. E.g. CSS2 and Selectors 3 excluded Pd entirely but we generalized to all of P*. And we actually do need to include Pd before the letter because dashes are frequently used as an opening quotation mark, but is slurping them up into the ::first-letter after the letter correct?

CC @dauwhe @r12a @faceless2 @johannesodland

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed [css-pseudo-4] Fine-tuning ::first-letter punctuation pattern matching, and agreed to the following:

  • RESOLVED: Exclude word break and no break spaces on either side of the letter
  • RESOLVED: Exclude dashes and opening punctuation (ps and pd in unicode) from following
The full IRC log of that discussion <dael> Topic: [css-pseudo-4] Fine-tuning ::first-letter punctuation pattern matching
<tantek> +1 jensimmons. agreed that's a good general methodology
<dael> github: https://github.com//issues/5830
<dael> fantasai: Problems riased in terms of matching ::first-letter. Example in an issue a " b". When we added ability to include whitespace we chose to include word spaces
<dael> fantasai: Problem is we aren't considering what's on other side of punct. If spec only used to separate makes sense, but not always. We're picking up punct that should be attached to word following. That's a problem.
<dael> fantasai: I think exclude word and no-break spaces on following side of letter. Maybe leading side but definitely following
<dael> Rossen_: Feedback on this one?
<dael> Rossen_: No feedback on it, I guess
<tantek> +1 reasoning makes sense
<dael> fantasai: Objections to makeing word and no break spaces not included?
<dael> jfkthame: No objection but thinking it should be same both before and after letter since that's less confusing. If they want a word space they need to use a different character no matter what
<dael> fantasai: Happy to exclude from both sides for consistency. word space isn't the correct character to use typographically anyway
<dael> Rossen_: Prop: Exclude word break and no break spaces on either side of the letter
<dael> Rossen_: Objections?
<dael> RESOLVED: Exclude word break and no break spaces on either side of the letter
<dael> fantasai: There are several classes of punct. One we don't want to include is opening punct after first letter. first-letter then { doens't make sense to include. In european lang there's spaces so it doens't matter much. For other lang there is no such thing and we're trying to get first letter. WE have opening punct class and that thosuld be excluded
<dael> fantasai: Two ambig classes that are common where I don't know how to handle cleanly. Least we can do is exclude the opening punct
<dael> Rossen_: Any feedback?
<tantek> +1 on excluding opening puncts
<dael> bradk: Sounds reasonable
<dael> fantasai: Similar case with dashes. Not quite sure the conventions for dash after first letter. I suspect we want to exclude on following side, but I'm not entirely familiar. On leading side long dashes mark quotations. Not sure on following. Inclenation is also exclude dashes from following punctuation
<bradk> I don’t have opinion about dashes
<dael> Rossen_: One resolution here?
<dael> fantasai: Sure
<dael> fantasai: Prop: Exclude dashes and opening punct (ps and pd in unicode) from following
<dael> RESOLVED: Exclude dashes and opening punctuation (ps and pd in unicode) from following
<dael> fantasai: I think that's if for now on this issue. Will need to come back, I think

fantasai added a commit that referenced this issue Dec 28, 2021
Backslash might be nice alternative, but turns into a yen sign in some fonts. :/
fantasai added a commit that referenced this issue Dec 28, 2021
@johannesodland
Copy link

johannesodland commented Sep 18, 2023

@fantasai Upon revisiting the resolution on this issue, I've identified a potential oversight that could significantly affect Norwegian typography.

The decision to exclude both the regular space (U+0020) and the no-break space (U+00A0) from either side of the letter inadvertently limits the utility of the ::first-letter pseudo-element for Norwegian texts. In Norwegian typography, direct speech is often indicated with an n-dash that's separated from the subsequent character by a space. (As discussed in #5154) Due to the challenges of entering special characters on many devices, authors often resort to using a regular space or, at best, a no-break space between the n-dash and the first letter.

By excluding these spaces before the letter, we risk rendering the ::first-letter pseudo-element less practical for Norwegian content. Given that other languages, such as French, also utilize spaces between punctuation and the initial letter, this decision might have broader implications (See quotation dash on Wikipedia for a list of some of the languages using this style of quotations). I genuinely believe this deserves another look. Would it be possible to reevaluate this decision?"

Edit: Opened a separate issue here: #9413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Accepted by CSSWG Resolution css-pseudo-4 Current Work i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Needs Testcase (WPT)
Projects
None yet
Development

No branches or pull requests

4 participants