MySqldump数据导入到Oracle过程中的部分方法简记

原创

乄尐 2013-12-13 16:53:00 博主文章分类：芝麻绿豆 ©著作权

文章标签 Oracle MySQL Python 文章分类 Oracle 数据库

©著作权归作者所有：来自51CTO博客作者乄尐的原创作品，谢绝转载，否则将追究法律责任

环境： python 2.7

之前写过一版 MySql 与 Oracle数据互导的小工具代码，不过是基于持续连接两边的数据库的，而且受网速限制比较大，于是直接从备份文件导入到目标数据库的想法就出现了。

不过由于考虑问题比较大意，此过程中经常对字符串的处理修修补补。

好在经过N次的测试及使用，暂时对于手头的数据处理是够用了。

好了，废话不多说了，以下是对MySQL dump出来的文件字符串处理过程中的几个核心方法

其他的字符串处理大同小异，有兴趣或者需要的童鞋可以交流探讨下

##coding=utf-8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
def enEstr(t):#对于MySql 导出数据字段值中的 某些转义字符  做预处理
    t = t.replace(u'\\\\',u'AJeenA')
    t = t.replace(u"\\'",u'BJeenB')
    t = t.replace(u'\\"',u'CJeenC')
    return t
def deEstr(t):#还原 Mysql 中 转义字符  为 Oracle 中 对应可用的
    if get_type(t) in ['str','unicode']:
        t = t.replace(u'AJeenA',u'\\')
        t = t.replace(u'BJeenB',u"'")
        t = t.replace(u'CJeenC',u'"')
    return t
def dlsTostrlist(t):
    temp = [] #括号及单引号配对
    fa = [] #最终返回值
    data = [] # 暂存 fa 中的元素
    for i in xrange(len(t)):
        c = t[i]
        if len(temp) == 0:
            if len(data) > 0:
                fa.append(''.join(data))
                data = []
        elif i == len(t) - 1:
            if c == u')' and len(temp) > 0 and temp[-1] == u'(':
                data.append(c)
                fa.append(''.join(data))
                data = []
        if c == u'(' :
            if len(temp) == 0:
                temp.append(c)
            elif temp[-1] == u'(':
                temp.append(c)
            data.append(c)
            continue
        elif c == u"'" :
            if len(temp) > 0 :
                if temp[-1] == "'":
                    temp.pop()
                else:
                    temp.append(c)
            data.append(c)
            continue
        elif c == u')' :
            if len(temp) > 0 :
                if temp[-1] == u'(':
                    temp.pop()
            data.append(c)
            continue
        elif c == u',' and len(temp) == 0:
            continue
        else:
            data.append(c)
    del temp,data
    return fa
def strTolist(t):
    t = t[1:-1]  #剔除前后的括号
    ds = t.split(u',') # 按 , 进行分割
    fa = [] #存储返回值
    data = [] #缓存 如'a,bb,ccc'分割后的 过程值 ["'a","bb","ccc"]
    temp = u'' # 用于判断data 是否已符合拼接要求
    for d in ds:
        if d.startswith(u"'") and d.endswith(u"'") and len(d) > 1: #完整的值 至少为 '' 空串,排除 '',',abcd' 分割后的异常情况
            fa.append(d[1:-1]) #剔除前后的 单引号
        elif d == u'NULL' : #空值
            fa.append(u'')
        elif d.isdigit() and len(data) == 0: #由数字组成 且不是 形如 'aa,2345,bbcc' 分割后的过程值 '2345'
            fa.append(int(d))
        else: #需要拼接的值 元素
            data.append(d)
            temp = u','.join(data)
            if temp.startswith(u"'") and temp.endswith(u"'") and len(temp) > 1: #符合拼接要求
                fa.append(temp[1:-1]) #剔除前后的 单引号
                data = []
    del temp,data,ds
    return fa
s = u"(1,'2013-12-10 15:06:21','Tcom_id',NULL,'tiBJeenBmCJeenCa)ge','ti,543,m)()aBJeenBg,e_src'),(2,'2013-12-10 15:11:09','Tcom_id','Tp_id','\u963f\u65af\u8482\u82ac','\u963f\u8428\u5fb7\u2019BJeenB')"
'''
此需求，由 MySql dump 生成的文件 导入到 Oracle 的过程中产生
 将形如 s 的 unicode 字符串 转化成  形如：
    [
        u"(1,'2013-12-10 15:06:21','Tcom_id',NULL,'tiBJeenBmCJeenCa)ge','ti,543,m)()aBJeenBg,e_src')",
        u"(2,'2013-12-10 15:11:09','Tcom_id','Tp_id','\u963f\u65af\u8482\u82ac','\u963f\u8428\u5fb7\u2019BJeenB')"
    ]
    的 unicode 字符串数组
再将 所得到 数组中的字符串 转化成  形如：
    [1, u'2013-12-10 15:06:21', u'Tcom_id', u'', u'tiBJeenBmCJeenCa)ge', u'ti,543,m)()aBJeenBge_src']
    的数组
#注意#字串中 不能有转义的 单引号   可在转化前 进行 enEstr() 处理， 得到结果后 在 deEstr() 进行对应的还原
'''
t = dlsTostrlist(s)
print t
print '\n'
l = t[0]
print l
b = strTolist(l)
print b

真是验证了一句话“bug只可能被发现，不可能被消灭”

欢迎指点或拍砖