这里介绍一下String和MessageFormat中的format方法的差异以及实现原理。
String与MessageFormat的说明
一、两者的使用场景
String.format:for layout justification and alignment, common formats for numeric, string, and date/time data, and locale-specific output.
MessageFormat.format:to produce concatenated messages in language-neutral way.
二、两者的性能比较
MeesageFormat由于是一个在先分析的指定位置插入相应的值,性能要好于采用正则表达式查找占位符的String.format方法。MessageFormat > String
三、以下是异常的情况
String message = MessageFormat.format("name={0}, age={}", 25, "huhx"); // java.lang.IllegalArgumentException: can't parse argument number:
String string = String.format("name=%s, age=%d", "huhx"); // java.util.MissingFormatArgumentException: Format specifier '%d'
两者的实现原理
我们通过下面的简单的例子来分析两者的原理:
public void messageFormat() {
String string = String.format("name=%s, age=%d", "huhx", 25);
String message = MessageFormat.format("name={1}, age={0}, {1}", 25, "huhx");
System.out.println(string);
System.out.println(message);
}
// name=huhx, age=25
// name=huhx, age=25, huhx
一、String.format的实现原理
String.format内部的实现是一个Formatter,使用了正则表达式来查找占位数据的。我们在这里贴出它实现的源代码。
1 public Formatter format(Locale l, String format, Object ... args) {
2 ensureOpen();
3 // index of last argument referenced
4 int last = -1;
5 // last ordinary index
6 int lasto = -1;
7
8 FormatString[] fsa = parse(format);
9 for (int i = 0; i < fsa.length; i++) {
10 FormatString fs = fsa[i];
11 int index = fs.index();
12 try {
13 switch (index) {
14 case -2: // fixed string, "%n", or "%%"
15 fs.print(null, l);
16 break;
17 case -1: // relative index
18 if (last < 0 || (args != null && last > args.length - 1))
19 throw new MissingFormatArgumentException(fs.toString());
20 fs.print((args == null ? null : args[last]), l);
21 break;
22 case 0: // ordinary index
23 lasto++;
24 last = lasto;
25 if (args != null && lasto > args.length - 1)
26 throw new MissingFormatArgumentException(fs.toString());
27 fs.print((args == null ? null : args[lasto]), l);
28 break;
29 default: // explicit index
30 last = index - 1;
31 if (args != null && last > args.length - 1)
32 throw new MissingFormatArgumentException(fs.toString());
33 fs.print((args == null ? null : args[last]), l);
34 break;
35 }
36 } catch (IOException x) {
37 lastException = x;
38 }
39 }
40 return this;
41 }
以下是Formatter内部的正则表达式:
private static final String formatSpecifier = "%(\\d+\\$)?([-#+ 0,(\\<]*)?(\\d+)?(\\.\\d+)?([tT])?([a-zA-Z%])";
使用formatSpecifier的正则表达式应用于name=%s, age=%d,会生成一个列表,也就是上述第9行代码的执行结果。里面大概记录了以下的内容,大小为4。
1、类型为FixedString,内容为name=
2、类型为FormatSpecifier,内容为%s
3、类型为FixedString,内容为, age=
4、类型为FormatSpecifier,内容为%d
这里对FixedString和FormatSpecifier做一个简单的说明。两者都是实现了FormatString接口。其中FormatString暴露了以下的三个方法。
private interface FormatString {
int index();
void print(Object arg, Locale l) throws IOException;
String toString();
}
如果是FixedString类型的,index为-2。如果是FormatSpecifier类型的,index为0。
1、类型为FixedString:使用的fs.print函数是把string内容写到Formatter类里面StringBuilder里。
2、类型为FormatSpecifier:使用fs.print里面的实现比较复杂,处理各种精度、对齐、布局调整等问题。
最后调用Formatter的toString方法,返回内容维护的StringBuilder内容。
public String toString() {
ensureOpen();
return a.toString();
}
二、MessageFormat.format的实现原理
MessageFormat的原理简单来说就是遍历第一个字符,维护一个{}数组,并且记录了{}的各个位置,各个位置还对应着index(参数的下标)。还是以下面的代码做分析
String message = MessageFormat.format("name={1}, age={0}, {1}", 25, "huhx");
首先它会调用一个applyPattern方法,这里我们先贴出代码。这一行代码执行完,会生成以下有用的信息。
其中offset是一个int数据,里面目前的数据是5,11,13分别代表{0}、{1}和{1}的位置。maxOffset为2代表上面的{n}有3个。argumentNumbers里面的1、0、1代表regex里面的{n}的n的值。这个过程具体可以看下面的代码。
1 public void applyPattern(String pattern) {
2 StringBuilder[] segments = new StringBuilder[4];
3 // Allocate only segments[SEG_RAW] here. The rest are
4 // allocated on demand.
5 segments[SEG_RAW] = new StringBuilder();
6
7 int part = SEG_RAW;
8 int formatNumber = 0;
9 boolean inQuote = false;
10 int braceStack = 0;
11 maxOffset = -1;
12 for (int i = 0; i < pattern.length(); ++i) {
13 char ch = pattern.charAt(i);
14 if (part == SEG_RAW) {
15 if (ch == '\'') {
16 if (i + 1 < pattern.length()
17 && pattern.charAt(i+1) == '\'') {
18 segments[part].append(ch); // handle doubles
19 ++i;
20 } else {
21 inQuote = !inQuote;
22 }
23 } else if (ch == '{' && !inQuote) {
24 part = SEG_INDEX;
25 if (segments[SEG_INDEX] == null) {
26 segments[SEG_INDEX] = new StringBuilder();
27 }
28 } else {
29 segments[part].append(ch);
30 }
31 } else {
32 if (inQuote) { // just copy quotes in parts
33 segments[part].append(ch);
34 if (ch == '\'') {
35 inQuote = false;
36 }
37 } else {
38 switch (ch) {
39 case ',':
40 if (part < SEG_MODIFIER) {
41 if (segments[++part] == null) {
42 segments[part] = new StringBuilder();
43 }
44 } else {
45 segments[part].append(ch);
46 }
47 break;
48 case '{':
49 ++braceStack;
50 segments[part].append(ch);
51 break;
52 case '}':
53 if (braceStack == 0) {
54 part = SEG_RAW;
55 makeFormat(i, formatNumber, segments);
56 formatNumber++;
57 // throw away other segments
58 segments[SEG_INDEX] = null;
59 segments[SEG_TYPE] = null;
60 segments[SEG_MODIFIER] = null;
61 } else {
62 --braceStack;
63 segments[part].append(ch);
64 }
65 break;
66 case ' ':
67 // Skip any leading space chars for SEG_TYPE.
68 if (part != SEG_TYPE || segments[SEG_TYPE].length() > 0) {
69 segments[part].append(ch);
70 }
71 break;
72 case '\'':
73 inQuote = true;
74 // fall through, so we keep quotes in other parts
75 default:
76 segments[part].append(ch);
77 break;
78 }
79 }
80 }
81 }
82 if (braceStack == 0 && part != 0) {
83 maxOffset = -1;
84 throw new IllegalArgumentException("Unmatched braces in the pattern.");
85 }
86 this.pattern = segments[0].toString();
87 }
后面做format工作,根据上述applyPattern分析出来的重要信息。大概的过程就是循环maxOffset,得到对应的offset下标。然后把参数插入到对应的位置。比如第一个的参数数字25会插入到pattern的第12位置,而huhx字符串会插入到pattern的第6和第14的位置。组装的一个string返回。以下是format的源码。
1 private StringBuffer subformat(Object[] arguments, StringBuffer result,
2 FieldPosition fp, List<AttributedCharacterIterator> characterIterators) {
3 // note: this implementation assumes a fast substring & index.
4 // if this is not true, would be better to append chars one by one.
5 int lastOffset = 0;
6 int last = result.length();
7 for (int i = 0; i <= maxOffset; ++i) {
8 result.append(pattern.substring(lastOffset, offsets[i]));
9 lastOffset = offsets[i];
10 int argumentNumber = argumentNumbers[i];
11 if (arguments == null || argumentNumber >= arguments.length) {
12 result.append('{').append(argumentNumber).append('}');
13 continue;
14 }
15 // int argRecursion = ((recursionProtection >> (argumentNumber*2)) & 0x3);
16 if (false) { // if (argRecursion == 3){
17 // prevent loop!!!
18 result.append('\uFFFD');
19 } else {
20 Object obj = arguments[argumentNumber];
21 String arg = null;
22 Format subFormatter = null;
23 if (obj == null) {
24 arg = "null";
25 } else if (formats[i] != null) {
26 subFormatter = formats[i];
27 if (subFormatter instanceof ChoiceFormat) {
28 arg = formats[i].format(obj);
29 if (arg.indexOf('{') >= 0) {
30 subFormatter = new MessageFormat(arg, locale);
31 obj = arguments;
32 arg = null;
33 }
34 }
35 } else if (obj instanceof Number) {
36 // format number if can
37 subFormatter = NumberFormat.getInstance(locale);
38 } else if (obj instanceof Date) {
39 // format a Date if can
40 subFormatter = DateFormat.getDateTimeInstance(
41 DateFormat.SHORT, DateFormat.SHORT, locale);//fix
42 } else if (obj instanceof String) {
43 arg = (String) obj;
44
45 } else {
46 arg = obj.toString();
47 if (arg == null) arg = "null";
48 }
49
50 // At this point we are in two states, either subFormatter
51 // is non-null indicating we should format obj using it,
52 // or arg is non-null and we should use it as the value.
53
54 if (characterIterators != null) {
55 // If characterIterators is non-null, it indicates we need
56 // to get the CharacterIterator from the child formatter.
57 if (last != result.length()) {
58 characterIterators.add(
59 createAttributedCharacterIterator(result.substring
60 (last)));
61 last = result.length();
62 }
63 if (subFormatter != null) {
64 AttributedCharacterIterator subIterator =
65 subFormatter.formatToCharacterIterator(obj);
66
67 append(result, subIterator);
68 if (last != result.length()) {
69 characterIterators.add(
70 createAttributedCharacterIterator(
71 subIterator, Field.ARGUMENT,
72 Integer.valueOf(argumentNumber)));
73 last = result.length();
74 }
75 arg = null;
76 }
77 if (arg != null && arg.length() > 0) {
78 result.append(arg);
79 characterIterators.add(
80 createAttributedCharacterIterator(
81 arg, Field.ARGUMENT,
82 Integer.valueOf(argumentNumber)));
83 last = result.length();
84 }
85 }
86 else {
87 if (subFormatter != null) {
88 arg = subFormatter.format(obj);
89 }
90 last = result.length();
91 result.append(arg);
92 if (i == 0 && fp != null && Field.ARGUMENT.equals(
93 fp.getFieldAttribute())) {
94 fp.setBeginIndex(last);
95 fp.setEndIndex(result.length());
96 }
97 last = result.length();
98 }
99 }
100 }
101 result.append(pattern.substring(lastOffset, pattern.length()));
102 if (characterIterators != null && last != result.length()) {
103 characterIterators.add(createAttributedCharacterIterator(
104 result.substring(last)));
105 }
106 return result;
107 }