周五的时候,同事突然问我有没有做过多线程写Excel的数据,看我一时没理解,同事说就是多线程往workbook中写数据。说起来Excel的操作之前做的很多了,但是重来没考虑过这么做,不过既然提起了,而且网上也有相关内容,何不自己尝试一下?于是自己便尝试用自己微薄的技术水平来实现下这个逻辑。
首先考虑需要哪些东西:
首先我们需要一个生成和处理数据的类:WriteDataUtils;
package dai.learn.write.utils;
import java.util.ArrayList;
import java.util.List;
/**
* 写数据的工具
* @author daify
* @create 2019-03-22 11:23
**/
public class WriteDataUtils {
/**
* 获得表格头
* @return
*/
public static List<String> getTitle() {
List<String> titles = new ArrayList <String>();
titles.add("序号");
titles.add("名称");
titles.add("占位符");
return titles;
}
/**
* 获得假数据
* @param max
* @return
*/
public static List<List<String>> getValues(int max) {
List<List<String>> rest = new ArrayList <List <String>>();
for (int i = 0; i < max; i++) {
List<String> item = new ArrayList <String>();
item.add(String.valueOf(i));
item.add("名称" + String.valueOf(i));
item.add("占位符" + String.valueOf(i));
rest.add(item);
}
return rest;
}
/**
* 将数据进行分组
* @param data
* @param groupNum
* @return
*/
public static List<List <List <String>>> groupData(List<List<String>> data,
Integer groupNum) {
int all = data.size();
int other = all%groupNum;
int groupItemNum = all/groupNum;
List<List <List <String>>> runList = new ArrayList <List <List <String>>>();
while (data == null || data.size() > 0) {
if (data.size() < other + groupItemNum) {
List <List <String>> lists = data.subList(0, data.size());
List <List <String>> item = new ArrayList <List <String>>();
item.addAll(lists);
runList.add(item);
data.removeAll(item);
} else {
List <List <String>> lists = data.subList(0, groupItemNum);
List <List <String>> item = new ArrayList <List <String>>();
item.addAll(lists);
runList.add(item);
data.removeAll(item);
}
}
return runList;
}
}
然后我们需要一个操作POI的相关工具:WritePOIUtils;
public class WritePOIUtils {
public static XSSFRow getRow(XSSFSheet sheetAt,Integer i) {
return sheetAt.getRow(i) == null ? sheetAt.createRow(i) : sheetAt.getRow(i);
}
public static void setWorkbookData(XSSFWorkbook workbook,
List<List<String>> data,
Integer startNum) {
XSSFSheet sheetAt = workbook.getSheetAt(0);
Integer endNum = data.size() + startNum;
Integer index = 0;
for (int i = startNum; i < endNum; i++) {
XSSFRow row = getRow(sheetAt,i);
List <String> values = data.get(index);
for (int j = 0; j < values.size(); j++) {
String s = values.get(j);
XSSFCell cell =
row.getCell(j) == null ? row.createCell(j) : row.getCell(j);
cell.setCellValue(s);
}
index++;
}
}
public static void writeFile (XSSFWorkbook workbook) throws IOException {
FileOutputStream out = null;
try {
out = new FileOutputStream("D:\\project_dai\\test.xlsx");
//向d://test.xls中写数据
out.flush();
workbook.write(out);
out.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
if (out != null) {
out.close();
}
}
}
}
我们还需要一个workbook的封装:XSSFWorkbookWrapper;
public class XSSFWorkbookWrapper {
private XSSFWorkbook workbook;
private XSSFSheet sheetAt;
public XSSFWorkbookWrapper () {
workbook = new XSSFWorkbook();
sheetAt = workbook.createSheet("main");
}
public void initTitile(List<String> titles) {
XSSFRow row = sheetAt.createRow(0);
AtomicInteger index = new AtomicInteger(0);
titles.forEach(title -> {
XSSFCell cell =
row.getCell(index.get()) == null ? row.createCell(index.get()) : row.getCell(index.get());
cell.setCellValue(title);
index.addAndGet(0);
});
}
public XSSFWorkbook getWorkbook() {
return workbook;
}
}
一个执行具体业务的类:MultiWrite;
public class MultiWrite {
public static void exec(int max,int threadMax) throws Exception {
XSSFWorkbookWrapper workbookWrapper = new XSSFWorkbookWrapper();
workbookWrapper.initTitile(WriteDataUtils.getTitle());
final List <List <String>> values = WriteDataUtils.getValues(max);
List <List <List <String>>> item = WriteDataUtils.groupData(values, threadMax);
Executor executor = new ThreadPoolExecutor(threadMax, threadMax,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>());
AtomicInteger integer = new AtomicInteger(0);
for (int i = 0; i < item.size(); i++) {
final List <List <String>> lists = item.get(i);
int finalI = i * lists.size() + 1;
Runnable runnable = new Runnable() {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + "执行");
WritePOIUtils.setWorkbookData(workbookWrapper.getWorkbook(),lists, finalI);
integer.addAndGet(1);
}
};
executor.execute(runnable);
}
while (integer.get() < threadMax) {
}
WritePOIUtils.writeFile(workbookWrapper.getWorkbook());
System.out.println("执行完毕");
}
}
一个测试main方法的类:POITest;
public class POITest {
public static void main(String[] args) throws Exception {
long l = System.currentTimeMillis();
//1W 3375 3w 17416
MultiWrite.exec(30000,8);
//1W 3925 3w 20891
//MultiWrite.exec(30000,1);
long l1 = System.currentTimeMillis();
System.out.println("总用时");
System.out.println(l1 - l);
}
}
emmm…………看起来有些简陋和粗糙,但是意思应该是这样的,但是刚跑一下就出现了问题…………
Exception in thread "pool-1-thread-5" Exception in thread "pool-1-thread-4" java.util.ConcurrentModificationException
at java.util.TreeMap$NavigableSubMap$SubMapIterator.nextEntry(TreeMap.java:1703)
at java.util.TreeMap$NavigableSubMap$SubMapEntryIterator.next(TreeMap.java:1751)
at java.util.TreeMap$NavigableSubMap$SubMapEntryIterator.next(TreeMap.java:1745)
at java.util.TreeMap$NavigableSubMap$EntrySetView.size(TreeMap.java:1637)
at java.util.TreeMap$NavigableSubMap.size(TreeMap.java:1507)
at org.apache.poi.xssf.usermodel.XSSFSheet.createRow(XSSFSheet.java:707)
at dai.learn.write.utils.WritePOIUtils.getRow(WritePOIUtils.java:20)
at dai.learn.write.utils.WritePOIUtils.setWorkbookData(WritePOIUtils.java:31)
at dai.learn.write.MultiWrite$1.run(MultiWrite.java:39)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
错误地点是
public static XSSFRow getRow(XSSFSheet sheetAt,Integer i) {
return sheetAt.getRow(i) == null ? sheetAt.createRow(i) : sheetAt.getRow(i);
}
为什么会这个地方错误?然后查看下源码看看。然后看到了这
private SortedMap<Integer, XSSFRow> _rows;
@Override
public XSSFRow createRow(int rownum) {
......
XSSFRow r = new XSSFRow(ctRow, this);
r.setRowNum(rownum);
_rows.put(rownum, r);
return r;
}
莫非是使用了非线程安全map的这个原因?我们尝试把getRow方法改成同步的,结果代码成功执行,执行看起来没问题,可以比较性能了。
、但是当我点开导出的结果时候发现结果是这样的
???WTF这是什么鬼?是我逻辑写错了么?后来我尝试把线程数改为1或者把setCellValue方法设为同步,就没有出现这种问题。
莫非setCellValue方法有什么奇怪的么?
ps.本身是准备查看poi的相关代码,但是poi底层代码源码缺失,里面内容对于我来说也相当困难。只能从一些测试结果来揣测。
首先:
synchronized (LOCK) {
cell.setCellValue(s);
}
将此方法加锁后,导入数据BUG则不存在。
另外一种解决方法:
之前数据都是我批量生成的,每一行,每一个单元格数据都不同,但是当我尝试用非常少量的几个字符串按照一定规律放入cell中的时候,问题也不再重现。
而源码中他们会将字符串创建缓存起来然后设置统一个索引,不知道是不是这里除了问题。
int sRef = _sharedStringSource.addEntry(rt.getCTRst());
public int addEntry(CTRst st) {
String s = getKey(st);
count++;
if (stmap.containsKey(s)) {
return stmap.get(s);
}
uniqueCount++;
//create a CTRst bean attached to this SstDocument and copy the argument CTRst into it
CTRst newSt = _sstDoc.getSst().addNewSi();
newSt.set(st);
int idx = strings.size();
stmap.put(s, idx);
strings.add(newSt);
return idx;
}
/**
* Array of individual string items in the Shared String table.
*/
private final List<CTRst> strings = new ArrayList<CTRst>();
/**
* Maps strings and their indexes in the <code>strings</code> arrays
*/
private final Map<String, Integer> stmap = new HashMap<String, Integer>();
总结
- 事后我通过加锁后进行测试,多线程的确比单线程有优势,但是感觉并不明显,可能是个人水平限制无法发挥多线程的优势,毕竟多线程这一块个人积累经验很少。
- 创建行的时候,因为并没有使用线程安全的集合,所以需要加锁。
- 在赋值操作的时候,因为目前我还未认识到的原因导致赋值错误,需要加锁。当然希望有了解这一块的朋友指点迷津。
- 代码项目在这里(一些内容有些改动,文章上代码有些许BUG): https://gitee.com/daifylearn/multi_thread_poi
- 这并不是一个很好的可以正常使用的一个代码,有相关需求的同学,还是需要根据自己理解去实现自己的业务。