封装流解决大excel分片上传cos
- BufferedMultipleOutputStream 解决分布读取流
- 背景
- 一 方案一,失败
- 二 使用sxss+stream分片,不推荐
- 三 方案三,sxss+ 缓冲流 + cos分片
BufferedMultipleOutputStream 解决分布读取流
背景
在toB的业务开发中,excel的生成与上传、下载是不可避免的。
- excel的生成一般是poi或者进行封装的
- 查询的数据分页查询
- 然后写入sheetwook中
- 拿个outputStream(输出到内存)进行输出,然后转换成inputStream,
- cos上传
在数据量小的时候没有问题,但是一旦数据量超过一万,oom不可避免的来了,提供服务的机器性能不加,但是不会再给提高配置,所以如果安全高效、节省空间的进行文件的下载是个问题。
一 方案一,失败
- 分页查询数据、分批写入到poi的sheetwook中,这个不难
- 第一次猜想使用分批写入流,分次flush到流中,如果流的大小超过了1m(cos要求分片上传的每片最小为1m),则上传;然后在进行读取数据,写入sheetwook,继续刷入流;直到数据查完。
问题:hutool在分批刷入(flush)流的时候报错了,流已经关闭,查看源码,poi在flush的时候关闭了流,各种方法都不管用了。为了解决问题还使用了sxss,但是也是有问题。这里不再写入源码,因为都是错误的。
二 使用sxss+stream分片,不推荐
- 接上边的步骤,将数据分页查询出来写入sheetwook;这个使用使用sxss使用的硬盘临时文件的方式,占用内存较小(这种方式对excel有些不友好,但是纯导出不影响)。
- 直接一次性刷入outputstream
- 然后将流分成若干个均等大小的流
- 没生成一个流去分片上传一次,这样就解决了分片上传的问题。
问题:这个方案,解决了分片的问题,解决了outputStream转inputstream的流过大的问题;但是缺憾是ouputStream流必须一个整流,这个流可能会很大。
InitiateMultipartUploadResult initiateMultipartUploadResult = tencentCOSService.initiateMultipartUpload(key);
logger.info("{}完成分片上传c操作{}",key,"initiateMultipartUpload");
List<PartETag> partETagList = Lists.newArrayList();
int partNumber = 1;
while (i <= (l / _M)) {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes, i * _M, _M);
UploadPartResult uploadPartResult = tencentCOSService.uploadPart(key, initiateMultipartUploadResult.getUploadId(), partNumber, bis.available(), byteArrayInputStream);
partETagList.add(uploadPartResult.getPartETag());
logger.info("{}完成分片上传c操作{},partNumber={}",key,"uploadPart",partNumber);
partNumber++;
i++;
}
CompleteMultipartUploadResult completeMultipartUploadResult = tencentCOSService.completeMultipartUpload(key, initiateMultipartUploadResult.getUploadId(), partETagList);
logger.info("{}完成分片上传c操作{}",key,"completeMultipartUpload");
url = tencentCOSService.getFileUrl(completeMultipartUploadResult);
三 方案三,sxss+ 缓冲流 + cos分片
自定义流,在writer.flush(outputstream);的outputStream做文章,为了解决这个问题,重新看流的源码与笔记,找到了BufferedOutputStream,简直是救星,当然还不是完全的适配,所以继承了这个流,重新new一个stream。
直接上代码
BufferedMultipleOutputStream
思路是利用缓冲区的,每次缓冲区满了就new一个新的流,继续写,旧的流拿去处理,所以缓冲区可以设置为1024*1024,这样缓冲区满了生成的流就可以直接拿去上传cos。
public class BufferedMultipleOutputStream extends BufferedOutputStream {
private IStreamHandler iStreamHandler;
// 即会有多少个数据流
private int pushCount=0;
// 缓存1兆 每个输出流都这么大,写满一个 在新建一个
public static int bufferSize = 1024 * 1024;
public BufferedMultipleOutputStream(OutputStream out, IStreamHandler iStreamHandler) {
super(out, bufferSize);
this.iStreamHandler = iStreamHandler;
}
private OutputStream nextOutputStream() throws IOException {
out.flush();
OutputStream outputStream = null;
Class class_ = out.getClass();
iStreamHandler.handler(out,pushCount);
pushCount++;
try {
System.out.println("class_=" + class_);
outputStream = (OutputStream) class_.newInstance();
} catch (InstantiationException | IllegalAccessException e) {
e.printStackTrace();
}
return outputStream;
}
/**
* Flush the internal buffer
*/
private void flushBuffer() throws IOException {
if (count > 0) {
out.write(buf, 0, count);
count = 0;
}
}
public synchronized void write(int b) throws IOException {
if (count >= buf.length) {
flushBuffer();
out = nextOutputStream();
}
buf[count++] = (byte) b;
}
public synchronized void write(byte b[], int off, int len) throws IOException {
int finalOffset = 0;
while (len > finalOffset) {
int offset =Math.min (len-off,bufferSize - count);
System.arraycopy(b, off, buf, count, offset);
finalOffset = finalOffset + offset;
off = off + offset;
count= count+offset;
if(count == bufferSize){
flushBuffer();
out = nextOutputStream();
}
}
}
public synchronized void flush() throws IOException {
//正常读写
}
// 因为flush对调用多次所以需要在使用方手动调用lastFlush
public synchronized void lastFlush() throws IOException {
// 最后处理
flushBuffer(); // 最后的缓存写入流
out.flush();
iStreamHandler.handler(out,pushCount);
}
IStreamHandler
public interface IStreamHandler {
void handler(OutputStream outputStream,int count);
void submitCos();
String getFileUlr();
}
CosIStreamHandler
public class CosIStreamHandler implements IStreamHandler {
private final Logger logger = LoggerFactory.getLogger(CosIStreamHandler.class);
private String key;
private ITencentCOSService tencentCOSService;
private InitiateMultipartUploadResult initiateMultipartUploadResult;
private List<PartETag> partETags = Lists.newArrayList();
private String uploadId = null;
private String fileUrl;
public CosIStreamHandler(String key, ITencentCOSService tencentCOSService) {
this.key = key;
this.tencentCOSService = tencentCOSService;
}
@Override
public void handler(OutputStream outputStream, int count) {
InputStream inputStream = StreamUtils.parse(outputStream);
int available = 0;
try {
available = inputStream.available();
} catch (IOException e) {
e.printStackTrace();
}
try {
if (count == 0) {
logger.debug("initiateMultipartUpload cos开始,key={}", key);
initiateMultipartUploadResult = tencentCOSService.initiateMultipartUpload(key);
uploadId = initiateMultipartUploadResult.getUploadId();
logger.debug("initiateMultipartUpload cos结束,uploadId={},key={}", uploadId, key);
}
if (StringUtils.isBlank(uploadId)) {
return;
}
int partNumber = count + 1;
logger.debug("上传cos开始,key={},partNumber={},inputStream.available()={},uploadId={}", key, partNumber, available, uploadId);
UploadPartResult uploadPartResult = tencentCOSService.uploadPart(key, uploadId, partNumber, available, inputStream);
logger.debug("上传cos开始,key={},partNumber={},inputStream.available()={},uploadId={},UploadPartResult={}", key, partNumber, available, uploadId, JSON.toJSONString(uploadPartResult));
partETags.add(new PartETag(uploadPartResult.getPartNumber(), uploadPartResult.getETag()));
} catch (Exception e) {
logger.info("分片调用cos失败,key={}", key, e);
tencentCOSService.abortMultipartUpload(key, uploadId);
throw new BusinessErrorException("分片调用cos失败,key=" + key);
} finally {
IoUtil.close(outputStream);
IoUtil.close(inputStream);
}
}
// 因为不能确认那个最后一页所以需要手动提交
@Override
public void submitCos() {
logger.debug("completeMultipartUpload cos开始,key={}", key);
CompleteMultipartUploadResult completeMultipartUploadResult = tencentCOSService.completeMultipartUpload(key, uploadId, partETags);
fileUrl = tencentCOSService.getFileUrl(completeMultipartUploadResult);
logger.debug("completeMultipartUpload cos结束,key={},result ={},fileUrl={}", key, JSON.toJSONString(completeMultipartUploadResult), fileUrl);
}
@Override
public String getFileUlr() {
return fileUrl;
}
使用伪码
OutputStream oututStream = new ByteArrayOutputStream();
IStreamHandler iStreamHandler = new CosIStreamHandler(key, cosServierBuilder.routeService(context));
BufferedMultipleOutputStream bufferedMultipleOutputStream = new BufferedMultipleOutputStream(oututStream, iStreamHandler);
try {
// 查询数据
for (int i = 0; i < pageTotal; i++) {
HutoolUtil.listToExcleBigDataOneStream(writer, handlerDataExportService, headerDTO);
}
writer.flush(bufferedMultipleOutputStream);
// cos end
bufferedMultipleOutputStream.lastFlush();
iStreamHandler.submitCos();
} catch (Exception e) {
logger.error("cos分片任务失败", e);
throw new BusinessErrorException("cos分片任务失败");
} finally {
IoUtil.close(bufferedMultipleOutputStream);
IoUtil.close(writer);
}
https://github.com/dawuti/wuti-common-code
虽然没有解决根本问题,但是这种方式(sxss + 封装流 + cos分片上传),是内存最低的。