一、概念
XML(eXtensible Markup Language),可扩展标记语言。标签可自定义,具有自我描述性,纯文本标识,跨平台/系统/语言,符合W3C标准。
展示形式:语言+意义
二、常规语法
- 任何的起始与结束标签必须有对应的结束标签;
- 简化写法(中间值为空的情况):<name></name>等价于<name/>;
- 大小写敏感,如<name>和<Name>不一致;
- 每个文件都需要一个根元素;
- 标签按序进行嵌套,不可错位;
- 特性必须有值,值要加引号;
- “<”等字符需要转义;
- 注释格式:<!-- 注释 -->
符号 | xml中表示 | 含义 |
< | < | 小于 |
> | > | 大于 |
>= | >= | 大于等于 |
<= | <= | 小于等于 |
<> | <> | 不等于 |
& | & | 与 |
' | ' | 单引号 |
" | " | 双引号 |
<bookStroe>
<book category="COOKING">
<title>J k</title>
<year>2005</year>
<price>29.5</price>
</book>
<book category="WEB">
<title>A D</title>
<year>2007</year>
<price>19.5</price>
</book>
</bookStroe>
三、XML解析方法
-树结构
- DOM:Document Object Model 文档对象模型,擅长小规模读写
-流结构
- SAX:Simple API for XML 流机制解释器(推模式),擅长读
- Stax:The Stream API for XML 流机制解释器(拉模式),擅长读(JDK6)
-库函数:JDK自带
-第三方库:
- JDOM:www.jdom.org
- DOM4J:dom4j.github.io
1.DOM API
DOM是W3C处理XML的标准API。适合小规模XML读写。
- 处理方式:将整个XML当做类似于树的形式读入内存中进行解析及修改
- 优点:直观易用
- 缺点:解析大数据量文件,有内存泄露及程序崩溃风险
<bookStore>
<book category="COOKING">
<title>J k</title>
<year>2005</year>
<price>29.5</price>
</book>
<book category="WEB">
<title>A D</title>
<year>2007</year>
<price>19.5</price>
</book>
</bookStore>
注意:
1、因为xml按文本流读入的,空格也算一个节点,类型为#text,如下面的1、3、5也算一个节点
2、单个标签算一个node,其value为null;title这个节点获取到value,需要继续取孩子节点及其value:
title.getFirstChild().getNodeValue())
import org.w3c.dom.*;
import javax.xml.parsers.*;
/**
* @author: Shism
* @Date: Created in 16:25 2023/3/21
* @Description:
**/
public class DomReader {
public static void main(String[] args) {
recurseXml();
}
public static void recurseXml(){
try
{
//采用Dom解析xml
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
//解析文件,获取到xml文件的Document对象,即树模型
Document document = (Document) documentBuilder.parse("./src/main/resources/book.xml");
//读入一级节点,即#Document
NodeList docList = document.getChildNodes();
System.out.println("Node Name1:" + document.getNodeName());
//遍历Document
for(int docItem = 0; docItem < docList.getLength(); docItem++){
//获取所有二级节点,即bookStore节点
NodeList bookStroeList = docList.item(docItem).getChildNodes();
System.out.println("Node Name2:" + docList.item(docItem).getNodeName());
for (int bookStroreItem = 0; bookStroreItem < bookStroeList.getLength(); bookStroreItem++){
//获取所有第三级节点,即book节点
Node book = bookStroeList.item(bookStroreItem);
if(book.getNodeName().equals("book")){
System.out.println("Node Name3:" + book.getNodeName());
NodeList node = book.getChildNodes();
for(int nodeItem = 0; nodeItem < node.getLength(); nodeItem++){
if(!node.item(nodeItem).getNodeName().equals("#text")){
//标签为一整个节点,value为null,继续往里取子节点
System.out.println(node.item(nodeItem).getNodeName()+":"+node.item(nodeItem).getNodeValue());
System.out.println(node.item(nodeItem).getNodeName()+":"+node.item(nodeItem).getFirstChild().getNodeValue());
}
}
}
}
}
System.out.println("---------------------------------------------");
//直接从doc中读取对应节点
NodeList bookList = document.getElementsByTagName("book");
for(int bookItem = 0; bookItem < bookList.getLength(); bookItem++){
NodeList nodeList = bookList.item(bookItem).getChildNodes();
for(int nodeItem = 0; nodeItem < nodeList.getLength(); nodeItem++){
if(!nodeList.item(nodeItem).getNodeName().equals("#text")){
System.out.println(nodeList.item(nodeItem).getNodeName()+":"+nodeList.item(nodeItem).getNodeValue());
System.out.println(nodeList.item(nodeItem).getNodeName()+":"+nodeList.item(nodeItem).getFirstChild().getNodeValue());
}
}
}
}catch (Exception e){
e.printStackTrace();
}
}
}
结果:
Node Name1:#document
Node Name2:bookStore
Node Name3:book
title:null
title:J k
year:null
year:2005
price:null
price:29.5
Node Name3:book
title:null
title:A D
year:null
year:2007
price:null
price:19.5
---------------------------------------------
title:null
title:J k
year:null
year:2005
price:null
price:29.5
title:null
title:A D
year:null
year:2007
price:null
price:19.5
Process finished with exit code 0
2、SAX方法:Simple API for XML
采用事件/流模型来解析XML文档,更快速、更轻量,适合大规模XML读
优点:
- 选择性访问,无需加载整个文档,内存要求低
- 推模型,每一个节点引发一个事件,需编写对应事件的处理程序;会把所有事件报出来
缺点:
- 流模型读取数据,难以同时访问文档中多处数据
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
/**
* @author: Shism
* @Date: Created in 10:35 2023/3/24
* @Description:
**/
public class SAXReader {
public static void main(String[] args) throws SAXException, IOException {
XMLReader parser = XMLReaderFactory.createXMLReader();
BookHandler bookHandler = new BookHandler();
parser.setContentHandler(bookHandler);
parser.parse("./src/main/resources/book.xml");
System.out.println(bookHandler.getBookList());
}
}
class BookHandler extends DefaultHandler{
private List<String> bookList = new ArrayList<>();
private boolean isBook = false;
//获取书本列表
public List<String> getBookList(){
return bookList;
}
//解析开始回调
@Override
public void startDocument ()
throws SAXException
{
System.out.println("start parse XML");
}
//解析结束回调
@Override
public void endDocument ()
throws SAXException
{
System.out.println("end parse XML");
}
//解析到元素标签开头时
@Override
public void startElement (String uri, String localName,
String qName, Attributes attributes)
throws SAXException
{
if(qName.equals("title"))
isBook = true;
}
//解析到正文时
@Override
public void characters (char ch[], int start, int length)
throws SAXException
{
String str = new String(ch, start, length);
if(isBook){
System.out.println("book name:"+str);
bookList.add(str);
}
}
//解析到元素标签结尾时
@Override
public void endElement (String uri, String localName, String qName)
throws SAXException
{
isBook = false;
}
}
结果:
start parse XML
book name:saddfljk
book name:wwwswwwwww
end parse XML
[saddfljk, wwwswwwwww]
Process finished with exit code 0
三、Stax方法:Streaming API for XML
-流模型中的拉模式
-遍历文档,从读取器中取出感兴趣部分
-两套API
- 基于指针的API,XMLStreamReader
- 基于迭代器的API,XMLEventReader
import javax.xml.stream.*;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;
import java.io.FileNotFoundException;
import java.io.FileReader;
/**
* @author: Shism
* @Date: Created in 16:00 2023/3/24
* @Description:
**/
public class StaxReader {
public static final String xml = "./src/main/resources/book.xml";
public static void main(String[] args) throws XMLStreamException, FileNotFoundException {
System.out.println("-----------pointer Type------------");
StaxReader.readByStream();
System.out.println("-----------Iterator Type-----------");
StaxReader.readByEvent();
}
//流模式
public static void readByStream() throws FileNotFoundException, XMLStreamException {
XMLInputFactory xmlInputFactory = XMLInputFactory.newFactory();
XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(new FileReader(xml));
//基于指针遍历
int i = 0;
while (xmlStreamReader.hasNext()){
//遍历至元素开始标签时,<元素>
//xmlStreamReader.next():当前指针的值 指针head-value1-value2
if(xmlStreamReader.next() == XMLStreamConstants.START_ELEMENT){
if("title".equalsIgnoreCase(xmlStreamReader.getLocalName()))
System.out.println("title:"+xmlStreamReader.getElementText());
}
}
xmlStreamReader.close();
}
//事件模式
public static void readByEvent() throws FileNotFoundException, XMLStreamException {
XMLInputFactory xmlInputFactory = XMLInputFactory.newFactory();
//创建事件流
XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileReader(xml));
boolean titleFlag = false;
while(xmlEventReader.hasNext()){
//从输入流获取事件
XMLEvent event = xmlEventReader.nextEvent();
//若为开始标签事件
if(event.isStartElement()){
StartElement element = event.asStartElement();
if(element.getName().getLocalPart().equals("title")){
titleFlag = true;
System.out.print("title:");
}
}
//若为正文事件
if(event.isCharacters() && titleFlag){
titleFlag = false;
System.out.println(event.asCharacters().getData());
}
}
}
}
输出(xml改了书名):
-----------pointer Type------------
title:book one
title:book two
-----------Iterator Type-----------
title:book one
title:book two
Process finished with exit code 0