如果我们有一些按照时间顺序发生的无线网络测量数据,每个测量数据有一个对应的坐标点,那么我们如何利用Power BI来进行可视化分析呢。这里记录一下我的作法。
我的原始数据如下:
Datetime | Latitude | Longitude | CellID | Operator | NetworkType | RSSI | RxLevel | RSRQ | RSRP |
|
|
|
|
|
|
|
|
|
|
从数据可以看到,这些测量数据发生在不同的时间点,分布在不同的地点,我们可以利用Power BI来进行可视化的分析,例如在地图上显示不同区域的网络测量值,分析网络测量值的分布,分析在每天不同时间段的测量值的分布等等。
这里我用到了Google的S2地理数据库来进行地理区域的聚合,用时序数据库来进行时间段的聚合。
地理区域的聚合
我打算按照网格来进行聚合,例如把地图切分为200*200米的方格,把每个测量点归属到这个方格里面。
首先我们需要把地图切分为网格,在我之前的博客有介绍,可以利用Turf这个开源的JavaScript来完成。这里给出代码:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="initial-scale=1.0, user-scalable=no" />
<style type="text/css">
body, html,#allmap {width: 100%;height: 100%;overflow: hidden;margin:0;font-family:"微软雅黑";}
</style>
<script src='https://unpkg.com/@turf/turf/turf.min.js'></script>
<title>turf</title>
</head>
<body>
<script type="text/javascript">
var bbox = [113.2078, 23.0797, 113.3766, 23.1591];
var cellSide = 0.2;
var options = {units: 'kilometers'};
var squareGrid = turf.squareGrid(bbox, cellSide, options);
console.log(squareGrid.features.length);
console.log(JSON.stringify(squareGrid));
</script>
</body>
把这个HTML文件放到Web服务器上,然后访问这个文件,在控制台上可以看到网格数据的输出,把这些输出数据拷贝到一个文本文件中,例如guangzhou.geojson
有了网格数据之后,我们处理这个文件,给每个网格赋予一个ID,并稍微修改一下,然后上传到Mapbox网站生成tileset,代码如下:
with open('guangzhou_grids_200m.geojson', 'r') as f:
grids_content = json.loads(f.read())
i = 0
for feature in grids_content['features']:
i += 1
feature['geometry']['type'] = 'MultiPolygon'
feature['geometry']['coordinates'] = [feature['geometry']['coordinates']]
feature['properties']['gridid'] = 'grid_'+str(i)
with open('guangzhou_grids_mapbox_200m.geojson', 'w') as f:
f.write(json.dumps(grids_content))
之后我们用S2来为每个网格生成一个S2Loop的对象,代码如下:
with open('guangzhou_grids_mapbox_200m.geojson', 'r') as f:
grids_content = json.loads(f.read())
grids_loop = {}
for feature in grids_content['features']:
coordinates = feature['geometry']['coordinates'][0][0]
s2points = []
for point in coordinates[:-1]:
s2points.append(s2.S2LatLng.FromDegrees(point[1], point[0]).Normalized().ToPoint())
s2loop = s2.S2Loop(s2points)
s2loop.Normalize()
grids_loop[feature['properties']['gridid']] = s2loop
在PG数据库安装TimeScaleDB插件,具体可见官网介绍https://docs.timescale.com/, 在PG上创建一个数据库和数据表,如下:
create database nir;
\c nir
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
CREATE TABLE "nir" (ts TIMESTAMPTZ, lat decimal(9,6), lng decimal(9,6), cellid text, operator text, networktype text, rssi int, rxlevel int, rsrq int, rsrp int, gridid text);
下一步就是把原始测量数据的坐标读出来,用S2来判断是否落在某个网格中,把网格ID也关联的测量数据中,然后把数据写入PG。代码如下:
with open('nir_processed.csv', 'r') as f:
nir = f.readlines()
newfields = []
for record in tqdm(nir[1:], total=len(nir[1:])):
fields = record.strip().split(',')
s2point = s2.S2LatLng.FromDegrees(float(fields[1]), float(fields[2])).ToPoint()
for gridid in grids_loop:
if grids_loop[gridid].Contains(s2point):
fields.append(gridid)
newfields.append(fields)
break
conn = psycopg2.connect("dbname=nir user=abc password=123456")
cursor=conn.cursor()
for record in newfields:
record[1] = float(record[1])
record[2] = float(record[2])
record[6] = int(record[6])
record[7] = int(record[7])
record[8] = int(record[8])
record[9] = int(record[9])
testsql = "insert into nir values('%s', '%f', '%f', '%s', '%s', '%s', '%d', '%d', '%d', '%d', '%s')" \
%(record[0], record[1], record[2], record[3], record[4], record[5], record[6], record[7], record[8], record[9], record[10])
cursor.execute(testsql)
conn.commit()
conn.close()
时间段的聚合
数据保存在时序数据库之后,我们就可以按照时间段来聚合了,这里按照每15分钟来进行聚合,并且把RSRP,RSRQ的数值进行分组,把结果写入到一个新的表中,在PG里面输入以下的SQL语句:
create table nir_view as
select
extract(minute from time_bucket('15 minutes', ts)) as minute,
date(ts) as date,
cast(extract(hour from ts) as int) as hour,
rsrp,
rsrq,
operator,
gridid,
(CASE
WHEN rsrp<=-44 and rsrp>-65 THEN '(-65,-44]'
WHEN rsrp<=-65 and rsrp>-75 THEN '(-75,-65]'
WHEN rsrp<=-75 and rsrp>-85 THEN '(-85,-75]'
WHEN rsrp<=-85 and rsrp>-95 THEN '(-95,-85]'
WHEN rsrp<=-95 and rsrp>-105 THEN '(-105,-95]'
ELSE '(-140,-105]' END
) as rsrp_range,
(CASE
WHEN rsrp<=-44 and rsrp>-65 THEN '6'
WHEN rsrp<=-65 and rsrp>-75 THEN '5'
WHEN rsrp<=-75 and rsrp>-85 THEN '4'
WHEN rsrp<=-85 and rsrp>-95 THEN '3'
WHEN rsrp<=-95 and rsrp>-105 THEN '2'
ELSE '1' END
) as rsrp_level,
(CASE
WHEN rsrq<=-18 and rsrq>-20 THEN '(-20,-18]'
WHEN rsrq<=-16 and rsrq>-18 THEN '(-18,-16]'
WHEN rsrq<=-14 and rsrq>-16 THEN '(-16,-14]'
WHEN rsrq<=-12 and rsrq>-14 THEN '(-14,-12]'
WHEN rsrq<=-10 and rsrq>-12 THEN '(-12,-10]'
WHEN rsrq<=-8 and rsrq>-10 THEN '(-10,-8]'
WHEN rsrq<=-6 and rsrq>-8 THEN '(-8,-6]'
WHEN rsrq<=-4 and rsrq>-6 THEN '(-6,-4]'
ELSE '(-4,-2]' END
) as rsrq_range,
(CASE
WHEN rsrq<=-18 and rsrq>-20 THEN '1'
WHEN rsrq<=-16 and rsrq>-18 THEN '2'
WHEN rsrq<=-14 and rsrq>-16 THEN '3'
WHEN rsrq<=-12 and rsrq>-14 THEN '4'
WHEN rsrq<=-10 and rsrq>-12 THEN '5'
WHEN rsrq<=-8 and rsrq>-10 THEN '6'
WHEN rsrq<=-6 and rsrq>-8 THEN '7'
WHEN rsrq<=-4 and rsrq>-6 THEN '8'
ELSE '9' END
) as rsrq_level
from nir;
Power BI报表制作
设计两页报表,一页是分析RSRP的,另外一页是分析RSRQ的。两份报表的布局是一样的。
以RSRP报表为例,里面包括了3个子报表,一个是在地图上按照方格来展示不同区域的RSRP的平均值,一个是按照RSRP的分组来展示其分布,还有一个是按照不同的时间段来展示RSRP的分布。
其中地图报表需要用到Mapbox控件,具体过程和我上一个博客介绍的类似,这里就不再重复了。
最后的效果如下: