有这样需求,原始数据如下图。开发需要把start_city_id和end_city_id作为key,
value是一个list集合,里面包含这些所有所有字段详细信息,存到redis。
|-- first_traffic_type: string (nullable = true)
|-- first_traffic_code: string (nullable = true)
|-- first_price: double (nullable = true)
|-- start_city_id: integer (nullable = true)
|-- start_city_name: string (nullable = true)
|-- start_station_code: string (nullable = true)
|-- start_station_name: string (nullable = true)
|-- transfer_arrive_station_code: string (nullable = true)
|-- transfer_arrive_station_name: string (nullable = true)
|-- transfer_city_id: integer (nullable = true)
|-- transfer_city_name: string (nullable = true)
|-- second_traffic_type: string (nullable = true)
|-- second_traffic_code: string (nullable = true)
|-- second_price: double (nullable = true)
|-- transfer_leave_station_code: string (nullable = true)
|-- transfer_leave_station_name: string (nullable = true)
|-- end_city_id: integer (nullable = true)
|-- end_city_name: string (nullable = true)
|-- end_station_code: string (nullable = true)
|-- end_station_name: string (nullable = true)
|-- total_num: long (nullable = true)
+------------------+------------------+-----------+-------------+---------------+------------------+------------------+----------------------------+----------------------------+----------------+------------------+-------------------+-------------------+------------+---------------------------+---------------------------+-----------+-------------+----------------+----------------+---------+
|first_traffic_type|first_traffic_code|first_price|start_city_id|start_city_name|start_station_code|start_station_name|transfer_arrive_station_code|transfer_arrive_station_name|transfer_city_id|transfer_city_name|second_traffic_type|second_traffic_code|second_price|transfer_leave_station_code|transfer_leave_station_name|end_city_id|end_city_name|end_station_code|end_station_name|total_num|
+------------------+------------------+-----------+-------------+---------------+------------------+------------------+----------------------------+----------------------------+----------------+------------------+-------------------+-------------------+------------+---------------------------+---------------------------+-----------+-------------+----------------+----------------+---------+
| T| K511| 19200.0| 917| 杭州市| HGH| 杭州东| GZQ| 广州| 1932| 广州市| T| K485| 5450.0| GZQ| 广州| 2056| 东莞市| RTQ| 东莞| 2|
| T| D3623| 16900.0| 2075| 南宁市| NFZ| 南宁东| IZQ| 广州南| 1932| 广州市| T| G6519| 21500.0| IZQ| 广州南| 3218| 香港| XJA| 香港西九龙| 9|
| T| D7265| 1000.0| 1976| 佛山市| FOQ| 佛山西| IZQ| 广州南| 1932| 广州市| T| G1004| 24400.0| IZQ| 广州南| 1821| 衡阳市| HVQ| 衡阳东| 6|
| T| D5124| 2400.0| 2335| 遂宁市| NIW| 遂宁| NCE| 南充北| 2359| 南充市| T| D1969| 6950.0| NCE| 南充北| 2327| 广元市| GYW| 广元| 6|
| T| D2223| 7700.0| 1792| 天门市| TNN| 天门南| HAN| 宜昌东| 1708| 宜昌市| T| K934| 4050.0| HAN| 宜昌东| 1857| 常德市| VGQ| 常德| 2|
| T| Z182| 1650.0| 1955| 深圳市| BJQ| 深圳东| HCQ| 惠州| 2015| 惠州市| T| T8368| 5850.0| HCQ| 惠州| 2029| 兴宁市| ENQ| 兴宁| 8|
| T| D8482| 6300.0| 2146| 玉林市| YLZ| 玉林| UCZ| 来宾北| 2185| 来宾市| T| D1794| 10600.0| UCZ| 来宾北| 2098| 三江侗族自治县| SWZ| 三江南| 4|
| T| Z291| 1250.0| 1742| 孝感市| XGN| 孝感| HKN| 汉口| 1678| 武汉市| T| D353| 10800.0| HKN| 汉口| 1708| 宜昌市| HAN| 宜昌东| 10|
| T| G822| 9950.0| 1955| 深圳市| IOQ| 深圳北| IZQ| 广州南| 1932| 广州市| T| C7639| 1000.0| IZQ| 广州南| 1976| 佛山市| ORQ| 顺德| 2|
| T| D3398| 20800.0| 1256| 九江市| JJG| 九江| HGH| 杭州东| 917| 杭州市| T| D3101| 9900.0| HGH| 杭州东| 1005| 临海市| UFH| 临海| 2|
| T| G2930| 7450.0| 1955| 深圳市| IOQ| 深圳北| IZQ| 广州南| 1932| 广州市| T| D2844| 7300.0| IZQ| 广州南| 2167| 贺州市| HXZ| 贺州| 20|
| T| D3760| 12800.0| 1932| 广州市| IZQ| 广州南| GGZ| 贵港| 2140| 贵港市| T| D8389| 2600.0| GGZ| 贵港| 2146| 玉林市| YLZ| 玉林| 32|
| T| D6051| 4400.0| 1415| 高密市| GMK| 高密| LYK| 莱阳| 1398| 莱阳市| T| K1162| 4150.0| LYK| 莱阳| 1453| 沂南县| YNK| 沂南| 2|
| T| D2332| 10700.0| 1955| 深圳市| IOQ| 深圳北| CBQ| 潮汕| 2058| 潮州市| T| D7437| 1000.0| CBQ| 潮汕| 1968| 汕头市| OTQ| 汕头| 4|
| T| T254| 2450.0| 82| 邯郸市| HDP| 邯郸| SJP| 石家庄| 36| 石家庄市| T| Z236| 18050.0| SJP| 石家庄| 648| 哈尔滨市| VAB| 哈尔滨西| 4|
| T| G6035| 3950.0| 2056| 东莞市| IUQ| 虎门| IOQ| 深圳北| 1955| 深圳市| T| D2288| 34200.0| IOQ| 深圳北| 949| 苍南县| CEH| 苍南| 6|
| T| G1417| 1550.0| 1337| 玉山县| YGG| 玉山南| SRG| 上饶| 1333| 上饶市| T| G322| 3950.0| SRG| 上饶| 1344| 婺源县| WYG| 婺源| 5|
| T| G429| 7100.0| 2940| 秦安县| QGJ| 秦安| LAJ| 兰州西| 2917| 兰州市| T| D2697| 5800.0| LAJ| 兰州西| 3018| 西宁市| XNO| 西宁| 7|
| T| D3627| 4250.0| 2068| 云浮市| IXQ| 云浮东| IZQ| 广州南| 1932| 广州市| T| G1015| 7450.0| IZQ| 广州南| 1955| 深圳市| IOQ| 深圳北| 6|
| T| G6222| 3450.0| 2056| 东莞市| IUQ| 虎门| IZQ| 广州南| 1932| 广州市| T| D1876| 6300.0| IZQ| 广州南| 2011| 怀集县| FAQ| 怀集| 5|
+------------------+------------------+-----------+-------------+---------------+------------------+------------------+----------------------------+----------------------------+----------------+------------------+-------------------+-------------------+------------+---------------------------+---------------------------+-----------+-------------+----------------+----------------+---------+
only showing top 20 rows
新增加一列,采用struct函数
Dataset<Row> combineData = dataset.withColumn("combined", struct("first_traffic_type", "first_traffic_code", "first_price", "start_city_id", "start_city_name",
"start_station_code", "start_station_name", "transfer_arrive_station_code", "transfer_arrive_station_name", "transfer_city_id", "transfer_city_name",
"second_traffic_type", "second_traffic_code", "second_price", "transfer_leave_station_code", "transfer_leave_station_name",
"end_city_id", "end_city_name", "end_station_code", "end_station_name", "total_num"))
.groupBy("start_city_id", "end_city_id").agg(collect_list("combined").as("combined_list"));
combineData.show(false);
combineData.printSchema();
结果如下:

|start_city_id|end_city_id|combined_list |

|1 |46 |[[T,T231,2350.0,1,北京市,BXP,北京西,BDP,保定,121,保定市,T,D6735,3400.0,BDP,保定,46,正定县,ZDP,正定,4]] |
|1 |49 |[[T,G521,12850.0,1,北京市,BXP,北京西,SJP,石家庄,36,石家庄市,T,G6287,2250.0,SJP,石家庄,49,高邑县,GNP,高邑西,4]] |
|1 |73 |[[T,C2035,5450.0,1,北京市,VNP,北京南,TJP,天津,18,天津市,T,K1301,2950.0,TJP,天津,73,迁安市,QQP,迁安,3]] |
|1 |232 |[[T,K695,5350.0,1,北京市,BJP,北京,DTV,大同,227,大同市,T,K616,1100.0,DTV,大同,232,阳高县,YOV,阳高,2]] |
|1 |504 |[[T,G237,29500.0,1,北京市,VNP,北京南,SYT,沈阳,463,沈阳市,T,G787,2800.0,SYT,沈阳,504,本溪市,BXT,本溪,5], [T,D17,20600.0,1,北京市,BJP,北京,SYT,沈阳,463,沈阳市,T,C1133,2500.0,SYT,沈阳,504,本溪市,BXT,本溪,2], [T,D6619,8750.0,1,北京市,BJP,北京,QTP,秦皇岛,74,秦皇岛市,T,G1229,15300.0,QTP,秦皇岛,504,本溪市,BXT,本溪,2]] |
|1 |511 |[[T,D17,20600.0,1,北京市,BJP,北京,SYT,沈阳,463,沈阳市,T,D7625,7000.0,SYT,沈阳,511,丹东市,DUT,丹东,9], [T,D9,8750.0,1,北京市,BJP,北京,QTP,秦皇岛,74,秦皇岛市,T,G397,19800.0,QTP,秦皇岛,511,丹东市,DUT,丹东,2], [T,D6615,5200.0,1,北京市,BJP,北京,TSP,唐山,59,唐山市,T,K27,19400.0,TSP,唐山,511,丹东市,DUT,丹东,2], [T,D6615,5200.0,1,北京市,BJP,北京,TSP,唐山,59,唐山市,T,K27,11200.0,TSP,唐山,511,丹东市,DUT,丹东,2], [T,D25,20600.0,1,北京市,BJP,北京,SBT,沈阳北,463,沈阳市,T,D7633,7000.0,SBT,沈阳北,511,丹东市,DUT,丹东,2], [T,G237,29500.0,1,北京市,VNP,北京南,SYT,沈阳,463,沈阳市,T,G787,7300.0,SYT,沈阳,511,丹东市,DUT,丹东,3]] |
|1 |526 |[[T,K39,8600.0,1,北京市,BJP,北京,JZD,锦州,518,锦州市,T,T5315,4150.0,JZD,锦州,526,营口市,XYT,熊岳城,4], [T,D6619,8750.0,1,北京市,BJP,北京,QTP,秦皇岛,74,秦皇岛市,T,G2627,16050.0,QTP,秦皇岛,526,营口市,BYT,鲅鱼圈,2]] |
|1 |589 |[[T,K53,17100.0,1,北京市,BJP,北京,SBT,沈阳北,463,沈阳市,T,K1571,6250.0,SBT,沈阳北,589,吉林市,JLL,吉林,2], [T,Z61,22250.0,1,北京市,BJP,北京,CCT,长春,578,长春市,T,C1203,3150.0,CCT,长春,589,吉林市,JLL,吉林,2]] |
|1 |605 |[[T,D101,20600.0,1,北京市,BJP,北京,SBT,沈阳北,463,沈阳市,T,K7565,4150.0,SBT,沈阳北,605,双辽市,ZJD,双辽,2], [T,D17,20600.0,1,北京市,BJP,北京,SYT,沈阳,463,沈阳市,T,2667,3650.0,SYT,沈阳,605,双辽市,ZJD,双辽,3]] |
|1 |648 |[[T,Z203,3650.0,1,北京市,BJP,北京,FUP,唐山北,59,唐山市,T,Z83,13850.0,FUP,唐山北,648,哈尔滨市,HBB,哈尔滨,2], [T,T17,4350.0,1,北京市,BJP,北京,QTP,秦皇岛,74,秦皇岛市,T,K547,22300.0,QTP,秦皇岛,648,哈尔滨市,HBB,哈尔滨,2]] |
|1 |658 |[[T,Z83,15250.0,1,北京市,BJP,北京,HBB,哈尔滨,648,哈尔滨市,T,D7811,9000.0,HBB,哈尔滨,658,依兰县,YEB,依兰,3]] |
|1 |855 |[[T,T215,14850.0,1,北京市,BJP,北京,AFH,盐城,879,盐城市,T,K563,2450.0,AFH,盐城,855,南通市,NUH,南通,17], [T,D705,23300.0,1,北京市,BJP,北京,NJH,南京,807,南京市,T,K724,4350.0,NJH,南京,855,南通市,NUH,南通,3], [T,D705,19700.0,1,北京市,BJP,北京,NJH,南京,807,南京市,T,K724,4350.0,NJH,南京,855,南通市,NUH,南通,4], [T,D715,19600.0,1,北京市,VNP,北京南,NJH,南京,807,南京市,T,D5512,10550.0,NJH,南京,855,南通市,NUH,南通,2]] |
|1 |889 |[[T,D705,19700.0,1,北京市,BJP,北京,NJH,南京,807,南京市,T,K724,1650.0,NJH,南京,889,扬州市,YLH,扬州,8], [T,D715,19600.0,1,北京市,VNP,北京南,NJH,南京,807,南京市,T,D5512,3750.0,NJH,南京,889,扬州市,YLH,扬州,3], [T,D711,19700.0,1,北京市,BJP,北京,NJH,南京,807,南京市,T,D5506,3750.0,NJH,南京,889,扬州市,YLH,扬州,6], [T,D705,23300.0,1,北京市,BJP,北京,NJH,南京,807,南京市,T,K724,1650.0,NJH,南京,889,扬州市,YLH,扬州,5], [T,G11,44350.0,1,北京市,VNP,北京南,NKH,南京南,807,南京市,T,D5546,3750.0,NJH,南京,889,扬州市,YLH,扬州,2], [T,D715,19600.0,1,北京市,VNP,北京南,NJH,南京,807,南京市,T,T152,1650.0,NJH,南京,889,扬州市,YLH,扬州,4], [T,G137,44350.0,1,北京市,VNP,北京南,NKH,南京南,807,南京市,T,D5542,3750.0,NJH,南京,889,扬州市,YLH,扬州,2], [T,T109,14850.0,1,北京市,BJP,北京,NJH,南京,807,南京市,T,K8392,1650.0,NJH,南京,889,扬州市,YLH,扬州,3], [T,D701,19700.0,1,北京市,BJP,北京,NJH,南京,807,南京市,T,D5562,3150.0,NJH,南京,889,扬州市,YLH,扬州,2]]|
|1 |957 |[[T,G57,53850.0,1,北京市,VNP,北京南,HGH,杭州东,917,杭州市,T,D3104,3200.0,HGH,杭州东,957,嘉善县,EAH,嘉善南,3], [T,G145,55300.0,1,北京市,VNP,北京南,AOH,上海虹桥,789,上海市,T,G7555,2950.0,AOH,上海虹桥,957,嘉善县,EAH,嘉善南,3], [T,G7,55300.0,1,北京市,VNP,北京南,AOH,上海虹桥,789,上海市,T,D3101,2400.0,AOH,上海虹桥,957,嘉善县,EAH,嘉善南,2], [T,G139,55300.0,1,北京市,VNP,北京南,AOH,上海虹桥,789,上海市,T,G7529,2950.0,AOH,上海虹桥,957,嘉善县,EAH,嘉善南,2], [T,G165,53850.0,1,北京市,VNP,北京南,HGH,杭州东,917,杭州市,T,G7542,4200.0,HGH,杭州东,957,嘉善县,EAH,嘉善南,2]] |
|1 |1053 |[[T,G155,44350.0,1,北京市,VNP,北京南,NKH,南京南,807,南京市,T,G7273,1550.0,NKH,南京南,1053,马鞍山市,OMH,马鞍山东,9], [T,G411,44350.0,1,北京市,VNP,北京南,NKH,南京南,807,南京市,T,D5607,2500.0,NKH,南京南,1053,马鞍山市,OMH,马鞍山东,2], [T,G127,74850.0,1,北京市,VNP,北京南,NKH,南京南,807,南京市,T,D9517,2500.0,NKH,南京南,1053,马鞍山市,OMH,马鞍山东,2], [T,G113,44350.0,1,北京市,VNP,北京南,NKH,南京南,807,南京市,T,G7283,1550.0,NKH,南京南,1053,马鞍山市,OMH,马鞍山东,2]] |
|1 |1235 |[[T,Z43,4350.0,1,北京市,BXP,北京西,VVP,石家庄北,36,石家庄市,T,K730,28950.0,VVP,石家庄北,1235,南昌市,NCG,南昌,2], [T,Z37,40950.0,1,北京市,BXP,北京西,WCN,武昌,1678,武汉市,T,K903,5150.0,WCN,武昌,1235,南昌市,NCG,南昌,2], [T,Z1,15250.0,1,北京市,BXP,北京西,WCN,武昌,1678,武汉市,T,K1091,14950.0,WCN,武昌,1235,南昌市,NCG,南昌,3], [T,C2111,5450.0,1,北京市,VNP,北京南,TJP,天津,18,天津市,T,Z102,17750.0,TJP,天津,1235,南昌市,NXG,南昌西,3], [T,Z1,15250.0,1,北京市,BXP,北京西,WCN,武昌,1678,武汉市,T,K1091,5150.0,WCN,武昌,1235,南昌市,NCG,南昌,5], [T,Z65,16350.0,1,北京市,BXP,北京西,JJG,九江,1256,九江市,T,K161,2150.0,JJG,九江,1235,南昌市,NCG,南昌,2]] |
|1 |1333 |[[T,K105,17350.0,1,北京市,BXP,北京西,NCG,南昌,1235,南昌市,T,G1394,10700.0,NCG,南昌,1333,上饶市,SRG,上饶,2], [T,Z133,17750.0,1,北京市,BXP,北京西,NXG,南昌西,1235,南昌市,T,G1460,11050.0,NXG,南昌西,1333,上饶市,SRG,上饶,2]] |
|1 |1344 |[[T,Z67,16350.0,1,北京市,BXP,北京西,JJG,九江,1256,九江市,T,D6262,7400.0,JJG,九江,1344,婺源县,WYG,婺源,2]] |
|1 |1427 |[[T,K101,7200.0,1,北京市,BJP,北京,JNK,济南,1347,济南市,T,G63,5950.0,JNK,济南,1427,曲阜市,QAK,曲阜东,2], [T,G15,18450.0,1,北京市,VNP,北京南,JGK,济南西,1347,济南市,T,G1567,5950.0,JGK,济南西,1427,曲阜市,QAK,曲阜东,3], [T,G137,18450.0,1,北京市,VNP,北京南,JGK,济南西,1347,济南市,T,G43,9950.0,JGK,济南西,1427,曲阜市,QAK,曲阜东,2], [T,G195,18450.0,1,北京市,VNP,北京南,JGK,济南西,1347,济南市,T,G1231,5950.0,JGK,济南西,1427,曲阜市,QAK,曲阜东,2], [T,G37,18450.0,1,北京市,VNP,北京南,JGK,济南西,1347,济南市,T,G1231,5950.0,JGK,济南西,1427,曲阜市,QAK,曲阜东,2], [T,G337,18450.0,1,北京市,VNP,北京南,JGK,济南西,1347,济南市,T,G1833,5950.0,JGK,济南西,1427,曲阜市,QAK,曲阜东,2]] |
|1 |1439 |[[T,G183,25900.0,1,北京市,VNP,北京南,WFK,潍坊,1404,潍坊市,T,G1845,13200.0,WFK,潍坊,1439,荣成市,RCK,荣成,2], [T,G105,18450.0,1,北京市,VNP,北京南,JGK,济南西,1347,济南市,T,D1621,20550.0,JGK,济南西,1439,荣成市,RCK,荣成,4], [T,G469,39800.0,1,北京市,VNP,北京南,WKK,威海,1436,威海市,T,C6507,1250.0,WKK,威海,1439,荣成市,RCK,荣成,3], [T,D335,46000.0,1,北京市,BJP,北京,QHK,青岛北,1358,青岛市,T,C6557,9300.0,QHK,青岛北,1439,荣成市,RCK,荣成,2], [T,D335,27000.0,1,北京市,BJP,北京,QHK,青岛北,1358,青岛市,T,C6557,9300.0,QHK,青岛北,1439,荣成市,RCK,荣成,3]] |

only showing top 20 rows
root
|-- start_city_id: integer (nullable = true)
|-- end_city_id: integer (nullable = true)
|-- combined_list: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- first_traffic_type: string (nullable = true)
| | |-- first_traffic_code: string (nullable = true)
| | |-- first_price: double (nullable = true)
| | |-- start_city_id: integer (nullable = true)
| | |-- start_city_name: string (nullable = true)
| | |-- start_station_code: string (nullable = true)
| | |-- start_station_name: string (nullable = true)
| | |-- transfer_arrive_station_code: string (nullable = true)
| | |-- transfer_arrive_station_name: string (nullable = true)
| | |-- transfer_city_id: integer (nullable = true)
| | |-- transfer_city_name: string (nullable = true)
| | |-- second_traffic_type: string (nullable = true)
| | |-- second_traffic_code: string (nullable = true)
| | |-- second_price: double (nullable = true)
| | |-- transfer_leave_station_code: string (nullable = true)
| | |-- transfer_leave_station_name: string (nullable = true)
| | |-- end_city_id: integer (nullable = true)
| | |-- end_city_name: string (nullable = true)
| | |-- end_station_code: string (nullable = true)
| | |-- end_station_name: string (nullable = true)
| | |-- total_num: long (nullable = true)
怎么读里面数据
iimport com.alibaba.fastjson.JSONObject;
import com.ly.tcbase.cacheclient.CacheClientHA;
import com.tc.base.AbstractSparkSql;
import com.tc.bean.es.RouteMd5Value;
import com.tc.conf.RedisConfig;
import com.tc.demo.EsDataOffline;
import com.tc.util.DESUtils;
import com.tc.util.DateUtil;
import com.tc.util.EsDataUtil;
import org.apache.spark.sql.*;
import scala.Tuple2;
import scala.collection.JavaConversions;
import scala.collection.mutable.WrappedArray;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
。。。
combineData.foreachPartition(
data -> {
Jedis jedis = new Jedis("localhost");
while (data.hasNext()) {
Row record = data.next();
List<RouteMd5Value> md5ValueList=new ArrayList<RouteMd5Value>();
String key = String.format("hotPlan:%s@%s", record.<String>getAs("start_city_id"), record.<String>getAs("end_city_id"));
WrappedArray<Row> combinedList=record.<WrappedArray<Row>>getAs("combined_list");
List<Row> rowList = new ArrayList<Row>(JavaConversions.<Row>seqAsJavaList(combinedList));
for(Row row:rowList){
Integer startCityId=row.<Integer>getAs("start_city_id");
Integer endCityId=row.<Integer>getAs("end_city_id");
Integer transferCityId=row.<Integer>getAs("transfer_city_id");
String startStationCode=row.<String>getAs("start_station_code");
String transferArriveStationCode=row.<String>getAs("transfer_arrive_station_code");
String firstTrafficType=row.<String>getAs("first_traffic_type");
String transferLeaveStationCode=row.<String>getAs("transfer_leave_station_code");
String endStationCode=row.<String>getAs("end_station_code");
String secondTrafficType=row.<String>getAs("second_traffic_type");
String firstTrafficCode=row.<String>getAs("first_traffic_code");
String secondTrafficCode=row.<String>getAs("second_traffic_code");
Long totalNum=row.<Long>getAs("total_num");
String md5=DESUtils.encoderByMd5(new StringBuffer().append(startCityId).append(endCityId).append(transferCityId).append(startStationCode).append(transferArriveStationCode).append(firstTrafficType).append(transferLeaveStationCode).append(endStationCode).append(secondTrafficType).append(firstTrafficCode).append(secondTrafficCode).toString());
RouteMd5Value routeMd5Value=new RouteMd5Value();
routeMd5Value.setTotalNum(totalNum);
routeMd5Value.setPlanKey(md5);
routeMd5Value.setTransferCity(transferCityId);
routeMd5Value.setTransferType(firstTrafficType+secondTrafficType);
md5ValueList.add(routeMd5Value);
}
String value= JSONObject.toJSONString(md5ValueList);
System.out.println("key:"+key+",value:"+value);
jedis.set(key,value);
}
}
);