官网:https://jsoup.org/

依赖

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

使用示例

String html = "<div><p>this is a text</p></div>";

// 解析字符串
Document doc = Jsoup.parse(html);
System.out.println(doc);

// 解析Html片段
Document fragment = Jsoup.parseBodyFragment(html);
System.out.println(fragment);
// 从Url解析
Document doc = Jsoup.connect("https://www.baidu.com/").get();
System.out.println(doc.title());

// 使用css解析器
Element element = doc.selectFirst("title");
System.out.println(element.text());

参考

  1. Java爬虫利器HTML解析工具-Jsoup
  2. https://www.open-open.com/jsoup/parsing-a-document.htm