Technical Notes |
Version | 1 |
Authors | Ken Whistler |
Date | 2014-04-01 |
This Version | http://www.unicode.org/notes/tn39/tn39-1.html |
Previous Version | n/a |
Latest Version | http://www.unicode.org/notes/tn39/ |
This document is the Bidi Brackets for Dummies course.
This document is a Unicode Technical Note. Sole responsibility for its contents rests with the author(s). Publication does not imply any endorsement by the Unicode Consortium. This document is not subject to the Unicode Patent Policy.
For information on Unicode Technical Notes including criteria for acceptance, see http://www.unicode.org/notes/.
This is the Bidi Brackets for Dummies course. When you finish this course, you will be able to identify bracket pairs according to the Unicode Bidirectional Algorithm (UBA for short). You will also be able to add to your resume that you have completed the BFD course.
There are no real prerequisites for this course. You don't even have to understand the concept of bidirectional text, or why the UBA even exists. We'll just focus on what you need to know to match up bracket pairs for rule N0 in the UBA. (And no, that is not en oh "NO" — it is en zero "N0".)
So what is a "bracket", anyway?
— Um, a "["?
Yes, correct. But actually, in the UBA, many other things are also identified as "brackets". So a parenthesis "(" is considered a bracket, and a curly brace "{" is also considered a bracket. The "[" bracket is called a square bracket, so we can tell it apart from all the other "brackets".
Basically, any of the punctuation marks that come in pairs, and which are used to enclose stuff in text, to set it off from other stuff outside the marks, are considered brackets. There are many others, besides the ones people typically recognize on their keyboard, including such oddities as U+29FC ⧼ LEFT-POINTING CURVED ANGLE BRACKET and U+2E26 ⸦ LEFT SIDEWAYS U BRACKET. There are even brackets in Unicode that aren't even called brackets or parentheses or braces — characters like U+29DA ⧚ LEFT DOUBLE WIGGLY FENCE and U+27C5 ⟅ LEFT S-SHAPED BAG DELIMITER. Don't ask me why — they are all just a bunch of weird squigglies used in pairs as punctuation.
— I don't see any weird squigglies there. That's just a bunch of boxes!
Get yourself some better fonts.
— So is the Arabic U+FDE3 ORNATE LEFT PARENTHESIS a bracket, too?
No.
— Why not?
Because I said so.
— How about quotation marks? They come in pairs and enclose stuff inside, separating it from all the stuff outside. Some of them, like that French thingie ‹ even look like brackets.
No, they aren't "brackets", either.
— Why not?
Because I said so.
— How about U+300C 「 LEFT CORNER BRACKET? Is that a "bracket"?
Yes, that is a bracket.
— But I thought that was a Japanese quotation mark. How come it is a bracket, but the other quotation marks aren't?
Because I said so. Oh, and U+230A ⌊ LEFT FLOOR is a "bracket", too.
— But that doesn't make any sense. What does "FLOOR" have to do with brackets?
It's probably something to do with maths, but it's a "bracket", anyway. Besides, if you were a decent carpenter, you would know what a floor bracket was.
— How about the left angle bracket "<"? Is that at least a "bracket"?
No.
— Why not?
Because it's a LESS-THAN SIGN. And enough with all these silly "Why?" and "Why not?" questions.
Your homework is to memorize the list of code points and character names in BidiBrackets.txt [Brackets]. That list defines all the brackets for UBA. And it doesn't matter what else you might think is a "bracket". That's the list and that's just the way it is.
If you were paying attention in the Chapter 1 lesson, you may have noticed that all the "brackets" had "LEFT" in their names. It turns out, not too surprisingly, that there are also corresponding "brackets" with "RIGHT" in their names.
The left and right versions of brackets form pairs with each other. So "[" is the left version of "]", and vice versa. They form a pair. And usually they are graphical mirrors of each other. The left of the pair points right, and the right of the pair points left.
— Wait! That's too confusing.
O.k., think of it this way. The left of the pair occurs to the left side of the text it encloses, and the right of the pair occurs to the right side of the text it encloses. Is that clearer?
— Yes, much!
Only, sometimes in bidirectional text, it is just the opposite.
— What?!
Well, don't worry about that now. Remember, you don't need to understand bidirectional text to understand the identification of bracket pairs. So forget I even mentioned that. The important thing is to know that the "LEFT" ones pair with the "RIGHT" ones, and it doesn't really matter, in the end, which way they point or where the text they enclose actually is, o.k.?
— Now wait a minute. I did my homework, and there was something in that list called TIBETAN MARK GUG RTAGS GYON. That doesn't have "LEFT" in its name. What's up with that?
Well, "GYON" means left in Tibetan. Same difference.
— You can't fool me though — I didn't see "LEFT" in OGHAM FEATHER MARK, either. And you can't convince me that "FEATHER" means left.
That one forms a pair with the OGHAM REVERSED FEATHER MARK. And you know what they say about foolish consistency, right?
Anyway, the point is that all of the "brackets" come in pairs. And even if they don't all actually have "LEFT" or "RIGHT" in their names, you can always tell which one is paired with which other one. If this isn't always obvious, then the way to figure it out is to go back to BidiBrackets.txt. The second number in each line of that file is the character that forms the pair. So you look at a line like:
005B; 005D; o # LEFT SQUARE BRACKET
And that tells you that U+005D is the pair for U+005B LEFT SQUARE BRACKET.
The next line in the file is:
005D; 005B; c # RIGHT SQUARE BRACKET
And that tells you that U+005B is the pair for U+005D RIGHT SQUARE BRACKET.
— But isn't that redundant? Why use two lines to tell me the same thing?
Yes. And because if I tell you something two times, it is true.
— Where did the "U+" come from? I don't see that in BidiBrackets.txt.
Go back to the Unicode for Dummies course.
So, for those of you who are still following along, the next thing to know about is that the "LEFT" brackets in the pairs are also known as "opening" and the "RIGHT" brackets in the pairs are also known as "closing". The mysterious "o" and "c" that you see on each line of BidiBrackets.txt stand for "o"pening and for "c"losing. The "o" or "c" values are how you tell which one of a pair is which, even if the characters don't actually have "LEFT" or "RIGHT" in their names. It is really, really important, it turns out, to know which is the opening one and which is the closing one, because figuring out which particular bracket can pair up with which particular other bracket in a long line of text depends on being able to find all the opening ones and all the closing ones.
But there is another complication here. It turns out that some brackets are canonical equivalents of other brackets, and those count as pairs, too.
— Come again?
Well, according to the Unicode Normalization Algorithm, a few of the angle brackets are canonically equivalent to other angle brackets, and because canonical equivalents cannot really be distinguished in most text processing, it isn't a good idea to separate them when it comes to identifying bracket pairs.
— Wait, I thought angle brackets weren't actually "brackets"!
Whatever. I'm not talking about "angle brackets" — I'm talking about U+2329 〈 LEFT-POINTING ANGLE BRACKET and U+3008 〈 LEFT ANGLE BRACKET, which are the same as each other, but which aren't the "<" angle bracket you are thinking about.
— Ugh. I hated the Unicode Normalization for Dummies course — it was so confusing.
Well, you are right about that, at least! For now, just memorize that U+3008 pairs with U+3009 and U+2329 pairs with U+232A, but that U+3008 also pairs with U+232A and U+3009 also pairs with U+2329. It is just like having to know that "pair" is also spelled "pear" — only that the pairs we are talking about here aren't pears, but angle brackets, even though they aren't "angle brackets".
Your homework for this chapter is to go back to BidiBrackets.txt [Brackets] and memorize all the "o" and "c" values in the list, so you know which character in each pair is the opening or the closing one.
— Will that be on the midterm?
Yes.
In Chapter 2 we learned all about which brackets pair with which brackets. But it turns out that so far that information is all abstract. We can think of that as the "semantics" of bracket pairs, because it is the truth of their meaning, independent of any use of them in text. An "opening bracket" is opening, just because of what it is and what it means. It doesn't matter where it occurs in text.
But determining which bracket pairs with which other bracket is more complicated than just having memorized all the information in BidiBrackets.txt and knowing which ones are opening and which ones are closing.
So what am I talking about? Let's start by looking at a few examples. In the following short text:
Voltage equals current multiplied by resistance (Ohm's Law)
The opening "(" is paired with the closing ")". And we then interpret the text "Ohm's Law" as being "inside the parentheses" — or, more generally as "inside the brackets", since we are talking generically about bracket pairs here. Our concern is not the Chicago Manual of Style [CMS], or exactly which brackets should be used where, but rather just figuring out which opening bracket goes with which closing bracket. It doesn't matter whether they are the right (or, um, "correct") ones or not. So for our purposes, it would be equally valid to consider:
Voltage equals current multiplied by resistance [Ohm's Law]
And see that the opening "[" is paired with the closing "]".
It is crucial, however, to note the exact order in which these pairs occur. If the text is changed to:
Voltage equals current multiplied by resistance ]Ohm's Law[
We would no longer consider that particular "]" to be paired with that particular "[". They are in the wrong order here. In principle, the closing "]" might pair with some opening "[" off the to the left of the line, but not visible here. And the "[" might pair with some other "]" from somewhere continuing on ahead in the text. But these two do not pair with each other, because they appear here in the wrong order to be considered a matching pair.
— Wait! In this case, can't I just redefine "]" as opening and "[" as closing? I kinda like the way "]Ohm's Law[" looks!
No. Well, maybe you could in an alternative universe, I suppose, but not with the UBA we have. Remember, I told you to memorize the opening and closing values in BidiBrackets.txt — not to question them.
At any rate, the point here is that for any particular bracket in any particular text, it totally matters which order you encounter them in. "(^.^)" contains a bracket pair — well, it might also contain an emoticon, but that is the topic for another course. ")^.^(", on the other hand, does not contain a bracket pair (but might contain an Anpanman).
Also, if I have two lefts, as in "(^.^(", those two brackets (well, parentheses, but whatever!) also do not pair with each other. They might pair up with some closing parentheses much later on in your text, but not with each other in just the text as we have it there. Remember, the general principle here is that two lefts do not make a right and two rights do not make a left. :-) :-)
Your homework for this chapter is to meditate on the epistemological implications of missing text and on The Order of Things.
Now that we have comes to grips with the fact that the order of brackets matters, we need to move on to dealing with the implications of more than two brackets occurring together. When people use multiple brackets altogether in a line, whether they are individually matching pairs or not, how do we figure out which particular bracket pairs up with which particular other bracket?
In the case of just two brackets, this is pretty simple to determine. But once the brackets start multiplying (with or without multiplication signs), the situation gets exponentially more complicated.
I'll start the discussion first using just sets of three brackets, which I like to call "trials" — because, after all, they are sets of three, kinda like triads. But also because we have to try various combinations to figure them out, and because it is a trial to explain all this to you.
And before I go further, there is something else you need to know. All of this work to match up brackets only counts if you are inside a bidirectional "isolating run sequence". Those are kinda like long distance training runs, only different. If you don't know what they are, don't worry — you can always buy my two-part course, Unicode Bidirectional Runs for Dummies and the sequel, Unicode Bidirectional Isolating Run Sequences for Dummies VI.III: The X10 Also Rises. For now, we'll just consider that somebody has already figured out what they are, and all the examples here are inside a single one of them, whatever they are.
So let's look at some examples. We'll stay simple to start with and use brackets all of the same kind — just square brackets.
Bud [Gus [Hank [Xerxes
O.k., there are three opening brackets in that line, but no closing brackets. So there are no bracket pairs, which require an opening bracket and a closing bracket in the correct order. How about:
Bud ]Gus [Hank [Xerxes
Still three brackets, but now one is closing and two are opening. They are still in the wrong order to make any pairs, however. But look at:
Bud [Gus ]Hank [Xerxes
Now the first two brackets are opening and closing in the correct order, so we can figure out that "[Gus ]" contains a bracket pair, which, in turn encloses the text "Gus ", including the space. The last opening bracket has no closing bracket to match with. Similarly:
Bud ]Gus [Hank ]Xerxes
Now the first closing bracket has nothing to match, but "[Hank ]" contains a bracket pair, enclosing the text "Hank ". Where things start to get interesting is in the following:
Bud [Gus [Hank ]Xerxes
In this case "[Hank ]" still contains a bracket pair. The first opening bracket on the line doesn't match the closing bracket, because there is an opening bracket after it which is closer to the closer. So the closer opener wins... wait, now I'm confusing myself! In any case, the outside brackets in "[Gus [Hank ]" don't match for a bracket pair — only the inside ones do.
Bud [Gus ]Hank ]Xerxes
That example is similar to the last one, only this time the bracket pair is in "[Gus ]", and the second closing bracket doesn't match any opening bracket.
Now let's throw a curveball in here — in particular, a curly brace. This changes things, and if you just swing away without keeping your eye on the ball, you are going to miss how things work:
Bud [Gus {Hank ]Xerxes
That example has the same pattern of opening-opening-closing, but now the second opening bracket (the left curly brace) no longer can match the closing bracket, because it is a different kind of bracket. So in this case, the outer pair matches, and we find a bracket pair in "[Gus {Hank ]", enclosing the text "Gus {Hank ", including the left curly brace, as well as the spaces.
As we'll see in the next chapter, the effect on that curly brace is pretty drastic, because the bracket pair (of square brackets) has taken square aim at and effectively snuffed the curly brace they enclose. That curly brace can't match any other curly brace which might have come later, because it has been enclosed already [in a {-coffin, as it were].
Be that as it may, remember that this behavior isn't inherent to square brackets and curly braces. It isn't as if square brackets are inherently "stronger" than curly braces and always win when they come into contention for matching. If the situation were reversed, the curly braces could turn the tables and ice the bracket, instead:
Bud {Gus [Hank }Xerxes
In this case there is a "bracket" pair in "{Gus [Hank }", enclosing the text "Gus [Hank ", including the left square bracket, as well as the spaces. So here the curly braces bury the single opening square bracket — along with Gus and Hank.
So as far as finding bracket pairs go, no single type of "bracket" has priority over any other. They are all equals. The important thing is which one comes first and then in what order all the rest occur on the line.
Your homework is to try to dig up more triads with square brackets and curly braces. If you find any "little problems" I have forgotten about, it doesn't matter.
O.k., in the last chapter we move on to the most difficult part of matching bracket pairs for UBA: what to do for long sequences of multiple brackets of different types. If you haven't washed out of the course by now, you might have what it takes to join Unicode Bracket Club.
So what is the first rule of Unicode Bracket Club?
— Um..., don't talk about brackets?
No. The first rule of Unicode Bracket Club is: Matching bracket pairs for UBA is not expression evaluation.
— Huh?
Well, yes. Most people's notion of how multiple parenthesis matching works comes from their high school algebra class. Remember those equations:
((a + b) * (b + 9a)) + ((2a – 1) * (2b + 17)) = –1
You had to go through and evaluate all the expressions and combine the inner expressions by applying the operators to form other expressions, and so on.
And what happened if you were missing a balancing parenthesis?
— Uh, the algebra teacher hit us with a ruler.
Right, and the equations didn't work, because they had what is called an expression error. If you use parentheses (or brackets for matrices, and so forth), they have to balance, or the expression has a syntax error, and it cannot be evaluated.
But in Unicode Bracket Club, if a "balancing parenthesis" is missing, meh! It just changes what "bracket" matches what other "bracket" in the line. And a missing match isn't an error — it's just "interesting".
So if you're all ready with your BNF for brackets and have an LL-grammar ready and want to get busy with an LALR parsing strategy for bidirectional brackets, you're probably in the wrong class. Maybe Compilers for Dummies is what you wanted.
Instead, in Unicode Bracket Club, we need to be able to figure out what to do with:
(((a[(])b
We need to identify the matching bracket pairs and the brackets that don't match, but always come out smiling. Because there are no errors, and everyone is a winner in Unicode Bracket Club!
So how do we find matching bracket pairs? You already have some clues from Chapter 4 and the earlier chapters:
1. An opener for a match has to come before its closer.
2. An opener for a match has to be the same kind of bracket as its closer.
3. If an unmatched opener or closer ends up inside a different matched pair, then it is erased from contention and stays unmatched.
So let's extend the examples for bracket trials and move into full-blown bracket matches. How can we describe the moves which will lead to a good match?
First of all, always start from the left and move systematically to the right. This is known as the logical order for a match. (Members of Unicode Bracket Club living in the Middle East tend to think of this as always starting from the right and moving systematically to the left, but they are still using the same logical order — it just looks different when you watch their matches. Think of it as watching from the other side of the ring.)
Then take the following steps:
1. If you encounter a closer before you have any opener to match it, just discard it. That is known as a feint.
2. If you encounter an opener, remember it. Those count for scoring. Also remember what kind of opener it was and exactly where you encountered it.
3. Move on.
4. If the next bracket you encounter is a closer, check to see if it matches one of the openers you remembered. You start from the most recently encountered opener and think back to the first opener that is of the same type. If you find one, congratulations, you have identified a bracket pair. Mark it down for keeping. This is known as a combination. And by the way, if you identified a bracket pair, but while you were remembering back to it, you passed over any brackets of different types, whether opening or closing, you can now forget about them. Those are all known as misses.
5. However, if the next bracket you encounter is another opener, just remember it, too, and where you found it. Those are known as keepers.
6. If the next character you encounter is not a bracket at all, just skip on by. That is known as fancy footwork.
7. Now check if you have moved all the way systematically in logical order to the end. If so, you are done, and it is time for scoring. But if you aren't at the end, then go back to step 3 and move on again. Remember to keep bobbing and weaving as you continue the match.
Once the match is done, it is time for scoring. Bring all the combinations you found to the scoring table and lay them out for the judges to examine. The judges will then reorder all the combinations by the positions they occurred in. The number of combinations you found, reordered in neat ascending order by position, constitute your score. Congratulations!
O.k., let's watch the progress of a particular match closely, to see all the moves one-by-one. The contender is:
(((a[(])b 123456789
Note that the positions are labeled, starting at 1, beneath each character. We are going to work our way through, systematically, in logical order, from 1 until we get all the way to the end at 9.
Position 1: Found an opener "(". Remember it: "(" at 1.
Position 2: Found an opener "(". Remember it: "(" at 2.
Position 3: Found an opener "(". Remember it: "(" at 3.
Position 4: Not a bracket. Skip on by.
Position 5: Found an opener "[". Remember it: "[" at 5.
Position 6: Found an opener "(". Remember it: "(" at 6.
Position 7: Found a closer (finally!) "]". Think back to the first matching opener before. That would be "[" at 5. Save the combination: "[" at 5 and "]" at 7. And since we passed over an unmatched opener "(" at 6, just forget that one now. It is a miss.
Position 8: Found a closer ")". Think back to the first matching opener before. Since we already forgot about "(" at 6, the correct match will be "(" at 3. Save the combination: "(" at 3 and ")" at 8.
Position 9: Not a bracket. Skip on by. But hey, we are at the end, so we're done with this match.
Scoring:
Lay out the combinations you found:
"[" at 5 and "]" at 7
"(" at 3 and ")" at 8
The judges rearrange those to:
"(" at 3 and ")" at 8
"[" at 5 and "]" at 7
And count up your matches: your score is 2!
All the other brackets you found during the match don't count for matching bracket pairs. They can be ignored from now on, because you have your neatly arranged and scored list of actual matched bracket pairs.
So lets go back to the start. The contender was:
(((a[(])b
And we have discovered that the first bracket pair in it is "(a[(])", which encloses the text "a[(]".
But that enclosed text itself contains the second bracket pair: "[(]", which encloses the text "(", itself an unmatched "bracket".
And the two parentheses at the very start are also unmatched "brackets".
So matched bracket pairs can be inside other matched bracket pairs. We just have to be very careful in finding them, because they aren't always immediately obvious. In particular, for a different contender:
[(])
The "[(]" contains a matching bracket pair, but "(])" does not.
— Well, I guess I understand sorta. But I've suffered all the way through these explanations and I still don't know what a matching bracket pair is when I see one. You've given me instructions for how to join Unicode Bracket Club and then find combinations that count as matched bracket pairs as I step through all the moves to deal with a contender. Why can't you just define what a goldarn matching bracket pair is, instead of sending me through all this Bracket Club folderol to try to find them??
Fair enough. I suppose we could try it that way. First add a few definitions:
BD14. An opening paired bracket is a character whose Bidi_Paired_Bracket_Type property value is Open.
BD15. A closing paired bracket is a character whose Bidi_Paired_Bracket_Type property value is Close.
BD16a. A bracket pair is a pair of an opening paired bracket and a closing paired bracket character such that the Bidi_Paired_Bracket property value of the former character or its canonical equivalent equals the latter character or its canonical equivalent.
Note that you already memorized the list of those for your homework for Chapter 2.
BD16b. A resolved bracket pair is a bracket pair that has been been selected from among possible bracket pairs in an isolating run sequence.
Now we've already agreed to ignore the fact that we don't know exactly how to find an "isolating run sequence". And this new term "resolved bracket pair" is just another way of saying "a bracket pair which has been resolved by some selection process to be a pair that matches". In other words, it means a "matching bracket pair" in a syntactic context.
— But that just gets me back to where I was before. I know what "matching bracket pair" means — I just don't know how to find the darn things! Now you're just saying they are "selected from among possible bracket pairs". How do I find them?
Relax, relax. The answer to that is coming. We'll just define a rule to select them:
Rx. For each isolating run sequence, bracket characters are selected into resolved bracket pairs as follows:
Starting at the beginning of the run sequence, when a closing bracket character is encountered, find the nearest preceding opening character that forms a bracket pair, but is not already part of a resolved bracket pair, and not ignored for bracket pair selection.
If one exists, resolve the pair, and mark any enclosed opening brackets of any kind as not part of a bracket pair and ignored for further bracket pair selection. Otherwise, if no pair can be selected, mark the closing bracket as not part of a pair and ignored for further pair selection.
— Gah! That's it?! How is that any clearer than the rules for Unicode Bracket Club matches? Can you show me how that works?
Sure. Let's go back to our original contender:
(((a[(])b 123456789
Let's use Rule Rx and select all the resolved bracket pairs.
1. Scan forward to the first closing bracket character. That is "]" at 7.
2. Scan backwards to the nearest opening bracket character that forms a bracket pair. That is "[" at 5.
3. Is "[" at 5 already part of a resolved bracket pair? (Oops! Reminder to self: remember to first scan through and set all brackets to "not-in-resolved-bracket-pair" before starting to apply Rx.) Let's go with the answer: No.
4. Is "[" at 5 not ignored for bracket pair selection? (Oops! Reminder to self: remember to first scan through and set all brackets to "not-ignored-for-bracket-selection" before starting to apply Rx.) Let's go with the answer: No.
5. O.k., we've determined that "[" at 5 meets the criterion, so we now have our first resolved bracket pair. Set "[" at 5 and "]" at 7 to "in-resolved-bracket-pair".
6. Find any enclosed opening bracket in the resolved bracket pair. That means scanning between 5 and 7. We find "(" at 6. Mark that as "not-in-resolved-bracket-pair" and "ignored-for-bracket-selection".
7. Go back to where we left off at step #1 and scan forward to the next closing bracket character. That is ")" at 8.
8. Scan backwards to the nearest opening bracket character that forms a bracket pair. That is "(" at 6.
9. Is "(" at 6 part of a resolved bracket pair? No.
10. Is "(" at 6 ignored for bracket pair selection? Yes. O.k., then we need to keep scanning back.
11. Scan backwards to the next nearest opening bracket character that forms a bracket pair. That is "(" at 3.
12. Is "(" at 3 part of a resolved bracket pair? No.
13. Is "(" at 3 ignored for bracket pair selection? No.
14. O.k., we've determined that "(" at 3 meets the criterion, so we now have our second resolved bracket pair. Set "(" at 3 and ")" at 8 to "in-resolved-bracket-pair".
15. Find any enclosed opening bracket in the resolved bracket pair. That means scanning between 3 and 8. We find "[" at 5 and (" at 6. Mark them as "not-in-resolved-bracket-pair" and as "ignored-for-bracket-selection". (Oops! Reminder to self: Update rule Rx, because we really didn't want to change "[" at 5 to "not-in-resolved-bracket-pair" and "ignored-for-bracket-selection". For now, we'll just pretend the rule is already patched up.)
16. Go back to where we left off at step #7 and scan forward to the next closing bracket character. O.k., there aren't any. We are done.
Now we can go on to scoring.
What are all the brackets identified as "in-resolved-bracket-pair"? Those would be:
"(" at 3, "[" at 5, "]" at 7, and ")" at 8
But what are the actual resolved pairs? Oops! We forgot to keep track. Reminder to self: Add requirement in rule to keep exact list of each resolved bracket pair as it is identified, for later reference.
O.k., let's go back and keep track as we resolve them, and we get:
"[" at 5 and "]" at 7
"(" at 3 and ")" at 8
That's two resolved bracket pairs, so our final score is 2!
We're done, right? Oops! We forgot to reorder the list in ascending order by position, rather than in order of selection by rule Rx. Reminder to self: Add requirement in rule to do post-selection reordering by position of the first character in each resolved bracket pair in the list.
Well, there were a few things that needed patching up here and there, but the process of selection is ever so much clearer expressed this way, right? We don't have to "remember" a stack of openers. All we have to "remember" is the process status of all the brackets, where we left off to restart the forward scan, the spans we have to check (and recheck) each time we identify a new resolved bracket pair, and the list of resolved bracket pairs as we select them.
Find the bracket pairs in this new contender:
({([{([){()([])]}}[{(X)})[)])})}}{([])[]})
For extra credit do the exercise twice, once with each method described here, and compare and contrast your results.
"[" is a bracket.
"(" is a "bracket", too.
"[" is opening, and pairs with "]", which is closing.
"{..}" contains a bracket pair. "}..{" does not.
"(..[)..]" → (..[), but neither "[" nor "]" is part of a pair.
"[(]x[)]" → [(] and [)], but neither "(" nor ")" is part of a pair.
[Brackets] | BidiBrackets.txt http://www.unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt |
[CMS] | The Chicago Manual of Style: The Essential Guide for Writers, Editors, and Publishers (14th Edition) University of Chicago Press (Trd); ISBN: 0226103897 Also see their FAQ at http://www.press.uchicago.edu/Misc/Chicago/cmosfaq.html |
[FAQ] | Unicode Frequently Asked Questions http://www.unicode.org/faq/ For answers to common questions on technical issues. |
[Glossary] | Unicode Glossary http://www.unicode.org/glossary/ For explanations of terminology used in this and other documents. |
[Reports] | Unicode Technical Reports http://www.unicode.org/reports/ For information on the status and development process for technical reports, and for a list of technical reports. |
[Versions] | Versions of the Unicode Standard http://www.unicode.org/versions/ For details on the precise contents of each version of the Unicode Standard, and how to cite them. |
The following summarizes modifications from the previous version of this document.
1 | First version |
Copyright © 2014 Ken Whistler and Unicode, Inc. All Rights Reserved. The Unicode Consortium and the authors make no expressed or implied warranty of any kind, and assume no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical note. The Unicode Terms of Use apply.
Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.