因为做股票金融的,每天产生的数据量是很大的,一个月几十亿的交易记录,也常有出现,特别是今年大跌之前大涨那会。


作为程序员,问题来了,有时需要将一些并不是特别符合规范的csv文件导入数据库中,而且每个文件有十几万行,而这样的文件几万个,于是几十亿的记录如何导入数据库呢?很多想着常用的方法,就是将csv文件读出来,然后一条条插到数据库,或者批量插到数据库,或者开个多线程,然而,花费的时间必须是几十天,而且不好控制,万一中断,那又不知该如何是好了。。。


对于我,我最先想到是开多线程,每次1000条批量插入数据库,然而,由于我犯了一个致命的错误:对于大量数据修改,数据库表先不要建索引,因为每次改动表,索引都会修改,这在数据量大的时候,消耗的时间也是比较大的。所以,我效果并不好,12个小时也才存了4千万数据。看到这个效果,我感觉不对劲,因为我要存的数据,至少是几十亿的,于是我觉得肯定有更效率的方法,直到 load data infile才让我觉得人生处处有惊喜!!


用LOAD DATA INFILE将1.csv导入数据库表d201505的语句如下:

LOAD DATA INFILE 'E:/1.csv'
INTO TABLE d201505
FIELDS TERMINATED BY ','
ENCLOSED BY '\"'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
(stockID,date,price,buysell,volume);


就这么一条语句,一个含有十几万数据的导入表中基本是秒级,于是我决定先将我的不规则的csv文件读写成合乎数据表字段的csv,然后LOAD DATA INFILE导入到数据表,并且我开了一百个多线程对文件进行写,开了三个线程对数据导入,效果很乐观,完整一个流程下来,10多亿数据花了八个小时,平均十分钟左右就能导入1500万数据,所以,在此记录下,我的程序如下:

import java.io.File;
import java.nio.charset.Charset;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
//用了javacsv这个插件包
import com.csvreader.CsvReader;
import com.csvreader.CsvWriter;

public class CSVReadAndWrite {
    // 写cvs文件
    public static void writeToCsv(String dirPath, String ym, String d,
            String[] cvsfile) {
        ExecutorService pool = Executors.newFixedThreadPool(100);
        for (int j = 0; j < cvsfile.length; j++) {
            Thread thread = new MDownThread(dirPath, cvsfile[j], ym, d);
            pool.execute(thread);

        }
        pool.shutdown();
        try {
            pool.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

    // 我导入的是201504这一个月的数据,所以是1-30,按天导入
    public static void main(String[] args) throws Exception {
        String ym = "201504";
        String d;

        for (int i = 1; i <= 30; i++) {
            Date t1 = new Date();
            String dirPath = "E:\\" + ym + "/" + ym;
            d = String.format("%02d", i);
            dirPath += d + "/";
            File file = new File(dirPath);
            String cvsfile[];
            cvsfile = file.list();
            if (cvsfile == null || cvsfile.length <= 0) {
                continue;
            }
            System.out.println(dirPath + "开始导入!");
            // 写入文件
            writeToCsv(dirPath, ym, d, cvsfile);
            ExecutorService pool2 = Executors.newFixedThreadPool(3);
            for (int j = 0; j < cvsfile.length; j++) {
                if (!new File("E:/" + ym + "/" + ym + d + "/"
                        + cvsfile[j]).exists()) {
                    writeToCsv(dirPath, ym, d, cvsfile);
                }
                Thread thread2 = new LoadMysqlThread(dirPath, cvsfile[j], ym, d);
                pool2.execute(thread2);
            }
            pool2.shutdown();
            try {
                pool2.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
            } catch (InterruptedException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
            Date t2 = new Date();
            System.out.println(dirPath + "导入成功,用时:"
                    + (t2.getTime() - t1.getTime()) / 1000 + "s");

        }

    }

}
//线程写csv文件
class MDownThread extends Thread {
    SimpleDateFormat sdf1 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");
    public String dirPath;
    public String cvsfile;
    public String ym;
    public String d;
    public int dj;

    public MDownThread(String dirPath, String cvsfile, String ym, String d) {
        this.dirPath = dirPath;
        this.cvsfile = cvsfile;
        this.ym = ym;
        this.d = d;
        this.dj = dj;
    }

    @Override
    public void run() {
        try {
            // 生成CsvReader对象,以,为分隔符,utf-8编码方式
            CsvReader r = new CsvReader(dirPath + cvsfile, ',',
                    Charset.forName("utf-8"));
            CsvWriter wr = new CsvWriter("E:\\" + ym + "\\"
                    + ym + d + "\\" + cvsfile, ',', Charset.forName("utf-8"));
            // System.out.println(dirPath+cvsfile[j]);
            // 逐条读取记录,直至读完
            while (r.readRecord()) {
                String regex = "(\\d+)";
                Pattern pattern = Pattern.compile(regex);
                Matcher matcher = pattern.matcher(cvsfile);
                String stockID = "";
                while (matcher.find()) {
                    stockID = matcher.group(1);
                }
                String date = sdf1.format(sdf.parse(ym + d + r.get(0)));
                File f = new File("E:\\shamrock\\project\\" + ym + "\\" + ym
                        + d + "\\" + cvsfile);
                if (!f.getParentFile().exists()) {
                    if (!f.getParentFile().mkdirs()) {
                        return;
                    }
                }
                String[] contents = { stockID, date, r.get(1), r.get(2),
                        r.get(3) };
                wr.writeRecord(contents);

            }

            r.close();
            wr.close();
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

}

class LoadMysqlThread extends Thread {
    SimpleDateFormat sdf1 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");
    public String dirPath;
    public String cvsfile;
    public String ym;
    public String d;
    public int dj;

    public LoadMysqlThread(String dirPath, String cvsfile, String ym, String d) {
        this.dirPath = dirPath;
        this.cvsfile = cvsfile;
        this.ym = ym;
        this.d = d;
        this.dj = dj;
    }
    //线程导入数据库
    @Override
    public void run() {
        try {
            
            Connection conn = null;
            String driver = "com.mysql.jdbc.Driver";
            String url = "jdbc:mysql://localhost:3306/tickdata";
            Class.forName(driver);
            conn = DriverManager.getConnection(url, "root", "123456");
            PreparedStatement pstmt = conn
                    .prepareStatement(" LOAD DATA INFILE 'E:/project/"
                            + ym
                            + "/"
                            + ym
                            + d
                            + "/"
                            + cvsfile
                            + "'   INTO TABLE d201505 FIELDS TERMINATED BY ',' ENCLOSED BY '\"' ESCAPED BY '\\\\' LINES TERMINATED BY '\\n' (stockID,date,price,buysell,volume);");
            pstmt.execute();
            conn.close();
        } catch (ClassNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (SQLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}



在此再说一个问题,比如上面的程序在自己电脑,导入本地mysql数据库没问题,但问题来了,因为数据量较大,想让别的电脑也来帮忙,然后存在我的电脑上,换句话说就是电脑A上的数据想导入到电脑B上的mysql中,如果你用上面的命令,肯定会报错找不到文件,因为那语句指定的文件是以mysql的安装路径为相对路径的,所以如果不在B电脑的数据自然会报错,于是此时你要LOCAL来指定是将本地A的数据导入到B mysql中,语法如下:

LOAD DATA LOCAL INFILE 'E:/1.csv'   
INTO TABLE d201505 
FIELDS TERMINATED BY ',' 
ENCLOSED BY '\"' 
ESCAPED BY '\\' 
LINES TERMINATED BY '\n'
(stockID,date,price,buysell,volume);



也就是LOAD DATA INFILE中加入LOCAL.....

好了,如果有遇到我这样问题的,希望能帮助到你哦!!!