在大多数情况下,按字典顺序排序的自然排序在Java中作为默认值很有用。这包括排序文件名,这些文件名也按字典顺序排序。
但是,当我们在文件中有版本号(例如一组SQL迁移脚本)时,我们希望将文件以更直观的顺序进行排序,其中字符串中包含的版本号将变为“语义”。在以下示例中,我们有一组版本,分别是“自然”排序和“语义”排序:

自然分类

版本1
版本10
10.1版
版本2
21版

语义排序

版本1
版本2
版本10
10.1版
21版

语义排序,Windows风格

Windows资源管理器也可以做到这一点,尽管使用“。”字符将文件名与结尾分开是有细微差别的,所以现在,我们将版本子编号(1)与文件结尾(sql)进行比较…
DK似乎没有Comparator实现此排序的内置函数,但是我们可以轻松地自己滚动。这个想法很简单。我们希望将文件名分成几个块,其中一个块可以是字符串(按字典顺序排序)或整数(按数字排序)。我们使用正则表达式拆分该文件名:

Pattern.compile("(?<=\D)(?=\d)|(?<=\d)(?=\D)");


该表达式匹配字符串和数字之间的边界,而没有实际捕获任何内容,因此我们可以将其用于split()操作。这个想法是由这个堆栈交换答案启发的。这是带有注释的比较器的逻辑:

public final class FilenameComparator
 implements Comparator {private static final Pattern NUMBERS = 
    Pattern.compile("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");

@Override
public final int compare(String o1, String o2) {

    // Optional "NULLS LAST" semantics:
    if (o1 == null || o2 == null)
        return o1 == null ? o2 == null ? 0 : -1 : 1;

    // Splitting both input strings by the above patterns
    String[] split1 = NUMBERS.split(o1);
    String[] split2 = NUMBERS.split(o2);
    int length = Math.min(split1.length, split2.length);

    // Looping over the individual segments
    for (int i = 0; i < length; i++) {
        char c1 = split1[i].charAt(0);
        char c2 = split2[i].charAt(0);
        int cmp = 0;

        // If both segments start with a digit, sort them
        // numerically using BigInteger to stay safe
        if (c1 >= '0' && c1 <= '9' && c2 >= '0' && c2 <= '9')
            cmp = new BigInteger(split1[i]).compareTo(
                  new BigInteger(split2[i]));

        // If we haven't sorted numerically before, or if
        // numeric sorting yielded equality (e.g 007 and 7)
        // then sort lexicographically
        if (cmp == 0)
            cmp = split1[i].compareTo(split2[i]);

        // Abort once some prefix has unequal ordering
        if (cmp != 0)
            return cmp;
    }

    // If we reach this, then both strings have equally
    // ordered prefixes, but maybe one string is longer than
    // the other (i.e. has more segments)
    return split1.length - split2.length;
}}
 而已。这是有关如何使用此示例:
 // Random order
 List list = asList(
 “version-10”,
 “version-2”,
 “version-21”,
 “version-1”,
 “version-10.1”
 );// Turn versions into files
 List l2 = list
 .stream()
 .map(s -> “C:\temp\” + s + “.sql”)
 .map(File::new)
 .collect(Collectors.toList());System.out.println(“Natural sorting”);
 l2.stream()
 .sorted()
 .forEach(System.out::println);System.out.println();
 System.out.println(“Semantic sorting”);
 l2.stream()
 .sorted(Comparator.comparing(
 File::getName,
 new FilenameComparator()))
 .forEach(System.out::println);
 输出为:
 自然分类
 C:\ temp \ version-1.sql
 C:\ temp \ version-10.1.sql
 C:\ temp \ version-10.sql
 C:\ temp \ version-2.sql
 C:\ temp \ version-21.sql
 语义排序
 C:\ temp \ version-1.sql
 C:\ temp \ version-2.sql
 C:\ temp \ version-10.1.sql
 C:\ temp \ version-10.sql
 C:\ temp \ version-21.sql
 同样,该算法非常简单,因为它无法区分文件结尾和“段”,因此将(1)与(sql)进行比较,这可能不是所需的行为。通过识别实际的文件结尾并将它们从比较逻辑中排除,可以轻松解决此问题-代价是无法对没有文件结尾的文件进行排序……比较器将如下所示:
 public final class FilenameComparator
 implements Comparator {private static final Pattern NUMBERS = 
    Pattern.compile("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
private static final Pattern FILE_ENDING =
    Pattern.compile("(?<=.*)(?=\\..*)");

@Override
public final int compare(String o1, String o2) {
    if (o1 == null || o2 == null)
        return o1 == null ? o2 == null ? 0 : -1 : 1;

    String[] name1 = FILE_ENDING.split(o1);
    String[] name2 = FILE_ENDING.split(o2);

    String[] split1 = NUMBERS.split(name1[0]);
    String[] split2 = NUMBERS.split(name2[0]);
    int length = Math.min(split1.length, split2.length);

    // Looping over the individual segments
    for (int i = 0; i < length; i++) {
        char c1 = split1[i].charAt(0);
        char c2 = split2[i].charAt(0);
        int cmp = 0;

        if (c1 >= '0' && c1 <= '9' && c2 >= 0 && c2 <= '9')
            cmp = new BigInteger(split1[i]).compareTo(
                  new BigInteger(split2[i]));

        if (cmp == 0)
            cmp = split1[i].compareTo(split2[i]);

        if (cmp != 0)
            return cmp;
    }

    int cmp = split1.length - split2.length;
    if (cmp != 0)
        return cmp;

    cmp = name1.length - name2.length;
    if (cmp != 0)
        return cmp;

    return name1[1].compareTo(name2[1]);
}}

现在的输出是:

C:\ temp \ version-1.sql

C:\ temp \ version-2.sql

C:\ temp \ version-10.sql

C:\ temp \ version-10.1.sql

C:\ temp \ version-21.sql