在大多数情况下,按字典顺序排序的自然排序在Java中作为默认值很有用。这包括排序文件名,这些文件名也按字典顺序排序。
但是,当我们在文件中有版本号(例如一组SQL迁移脚本)时,我们希望将文件以更直观的顺序进行排序,其中字符串中包含的版本号将变为“语义”。在以下示例中,我们有一组版本,分别是“自然”排序和“语义”排序:
自然分类
版本1
版本10
10.1版
版本2
21版
语义排序
版本1
版本2
版本10
10.1版
21版
语义排序,Windows风格
Windows资源管理器也可以做到这一点,尽管使用“。”字符将文件名与结尾分开是有细微差别的,所以现在,我们将版本子编号(1)与文件结尾(sql)进行比较…
DK似乎没有Comparator实现此排序的内置函数,但是我们可以轻松地自己滚动。这个想法很简单。我们希望将文件名分成几个块,其中一个块可以是字符串(按字典顺序排序)或整数(按数字排序)。我们使用正则表达式拆分该文件名:
Pattern.compile("(?<=\D)(?=\d)|(?<=\d)(?=\D)");
该表达式匹配字符串和数字之间的边界,而没有实际捕获任何内容,因此我们可以将其用于split()操作。这个想法是由这个堆栈交换答案启发的。这是带有注释的比较器的逻辑:
public final class FilenameComparator
implements Comparator {private static final Pattern NUMBERS =
Pattern.compile("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
@Override
public final int compare(String o1, String o2) {
// Optional "NULLS LAST" semantics:
if (o1 == null || o2 == null)
return o1 == null ? o2 == null ? 0 : -1 : 1;
// Splitting both input strings by the above patterns
String[] split1 = NUMBERS.split(o1);
String[] split2 = NUMBERS.split(o2);
int length = Math.min(split1.length, split2.length);
// Looping over the individual segments
for (int i = 0; i < length; i++) {
char c1 = split1[i].charAt(0);
char c2 = split2[i].charAt(0);
int cmp = 0;
// If both segments start with a digit, sort them
// numerically using BigInteger to stay safe
if (c1 >= '0' && c1 <= '9' && c2 >= '0' && c2 <= '9')
cmp = new BigInteger(split1[i]).compareTo(
new BigInteger(split2[i]));
// If we haven't sorted numerically before, or if
// numeric sorting yielded equality (e.g 007 and 7)
// then sort lexicographically
if (cmp == 0)
cmp = split1[i].compareTo(split2[i]);
// Abort once some prefix has unequal ordering
if (cmp != 0)
return cmp;
}
// If we reach this, then both strings have equally
// ordered prefixes, but maybe one string is longer than
// the other (i.e. has more segments)
return split1.length - split2.length;
}}
而已。这是有关如何使用此示例:
// Random order
List list = asList(
“version-10”,
“version-2”,
“version-21”,
“version-1”,
“version-10.1”
);// Turn versions into files
List l2 = list
.stream()
.map(s -> “C:\temp\” + s + “.sql”)
.map(File::new)
.collect(Collectors.toList());System.out.println(“Natural sorting”);
l2.stream()
.sorted()
.forEach(System.out::println);System.out.println();
System.out.println(“Semantic sorting”);
l2.stream()
.sorted(Comparator.comparing(
File::getName,
new FilenameComparator()))
.forEach(System.out::println);
输出为:
自然分类
C:\ temp \ version-1.sql
C:\ temp \ version-10.1.sql
C:\ temp \ version-10.sql
C:\ temp \ version-2.sql
C:\ temp \ version-21.sql
语义排序
C:\ temp \ version-1.sql
C:\ temp \ version-2.sql
C:\ temp \ version-10.1.sql
C:\ temp \ version-10.sql
C:\ temp \ version-21.sql
同样,该算法非常简单,因为它无法区分文件结尾和“段”,因此将(1)与(sql)进行比较,这可能不是所需的行为。通过识别实际的文件结尾并将它们从比较逻辑中排除,可以轻松解决此问题-代价是无法对没有文件结尾的文件进行排序……比较器将如下所示:
public final class FilenameComparator
implements Comparator {private static final Pattern NUMBERS =
Pattern.compile("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
private static final Pattern FILE_ENDING =
Pattern.compile("(?<=.*)(?=\\..*)");
@Override
public final int compare(String o1, String o2) {
if (o1 == null || o2 == null)
return o1 == null ? o2 == null ? 0 : -1 : 1;
String[] name1 = FILE_ENDING.split(o1);
String[] name2 = FILE_ENDING.split(o2);
String[] split1 = NUMBERS.split(name1[0]);
String[] split2 = NUMBERS.split(name2[0]);
int length = Math.min(split1.length, split2.length);
// Looping over the individual segments
for (int i = 0; i < length; i++) {
char c1 = split1[i].charAt(0);
char c2 = split2[i].charAt(0);
int cmp = 0;
if (c1 >= '0' && c1 <= '9' && c2 >= 0 && c2 <= '9')
cmp = new BigInteger(split1[i]).compareTo(
new BigInteger(split2[i]));
if (cmp == 0)
cmp = split1[i].compareTo(split2[i]);
if (cmp != 0)
return cmp;
}
int cmp = split1.length - split2.length;
if (cmp != 0)
return cmp;
cmp = name1.length - name2.length;
if (cmp != 0)
return cmp;
return name1[1].compareTo(name2[1]);
}}
现在的输出是:
C:\ temp \ version-1.sql
C:\ temp \ version-2.sql
C:\ temp \ version-10.sql
C:\ temp \ version-10.1.sql
C:\ temp \ version-21.sql