检查数据库分片
mongos> use config
switched to db config
mongos> db.databases.find()
{ "_id" : "recommend", "primary" : "hdshard1", "partitioned" : true, "version" : { "uuid" : UUID("cb833b8e-cc4f-4c52-83c3-719aa383bac4"), "lastMod" : 1 } }
{ "_id" : "db1", "primary" : "hdshard3", "partitioned" : true, "version" : { "uuid" : UUID("71bb472c-7896-4a31-a77c-e3aaf723be3c"), "lastMod" : 1 } }
可以看到数据库分片已经打开。
查看集合分片情况
mongos> db.rcmd_1_min_tag_mei_rong.getShardDistribution()
Collection recommend.rcmd_1_min_tag_mei_rong is not sharded.
接下来我们对该集合进行分片。
查看索引
mongos> db.rcmd_1_min_tag_mei_rong.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "recommend.rcmd_1_min_tag_mei_rong"
}
]
创建哈希索引
mongos> db.rcmd_1_min_tag_mei_rong.createIndex({_id:'hashed'})
{
"raw" : {
"hdshard1/172.16.254.136:40001,172.16.254.137:40001,172.16.254.138:40001" : {
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
},
"ok" : 1,
"operationTime" : Timestamp(1618801258, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1618801258, 1),
"signature" : {
"hash" : BinData(0,"Ls7hIrcqTdWwfI49kxXo6NQtAk4="),
"keyId" : NumberLong("6941260985399246879")
}
}
}
检查索引
mongos> db.rcmd_1_min_tag_mei_rong.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "recommend.rcmd_1_min_tag_mei_rong"
},
{
"v" : 2,
"key" : {
"_id" : "hashed"
},
"name" : "_id_hashed",
"ns" : "recommend.rcmd_1_min_tag_mei_rong"
}
]
可以看到id字段的哈希索引已经创建。
进行哈希分片
mongos> sh.shardCollection( "recommend.rcmd_1_min_tag_mei_rong", { _id: "hashed" } )
{
"collectionsharded" : "recommend.rcmd_1_min_tag_mei_rong",
"collectionUUID" : UUID("ccac34f4-59e4-4048-971b-644b454adc13"),
"ok" : 1,
"operationTime" : Timestamp(1618801293, 5),
"$clusterTime" : {
"clusterTime" : Timestamp(1618801293, 5),
"signature" : {
"hash" : BinData(0,"bs4MPk31IZlYGMw3c2JN6eJ3pSc="),
"keyId" : NumberLong("6941260985399246879")
}
}
}
查看分片情况
mongos> db.rcmd_1_min_tag_mei_rong.getShardDistribution()
Shard hdshard1 at hdshard1/172.16.254.136:40001,172.16.254.137:40001,172.16.254.138:40001
data : 72.94MiB docs : 166775 chunks : 3
estimated data per chunk : 24.31MiB
estimated docs per chunk : 55591
Shard hdshard2 at hdshard2/172.16.254.136:40002,172.16.254.137:40002,172.16.254.138:40002
data : 96.13MiB docs : 219788 chunks : 3
estimated data per chunk : 32.04MiB
estimated docs per chunk : 73262
Shard hdshard3 at hdshard3/172.16.254.136:40003,172.16.254.137:40003,172.16.254.138:40003
data : 64.11MiB docs : 146526 chunks : 2
estimated data per chunk : 32.05MiB
estimated docs per chunk : 73263
Totals
data : 233.18MiB docs : 533089 chunks : 8
Shard hdshard1 contains 31.27% data, 31.28% docs in cluster, avg obj size on shard : 458B
Shard hdshard2 contains 41.22% data, 41.22% docs in cluster, avg obj size on shard : 458B
Shard hdshard3 contains 27.49% data, 27.48% docs in cluster, avg obj size on shard : 458B
哈希分片已经完成。
范围分片和哈希分片对比
1)基于范围的分片方式提供了更高效的范围查询,给定一个片键的范围,分发路由可以很简单地确定哪个数据块存储了请求需要的数据,并将请求转发到相应的分片中。
2)不过,基于范围的分片会导致数据在不同分片上的不均衡,有时候,带来的消极作用会大于查询性能的积极作用.比如,如果片键所在的字段是线性增长的,一定时间内的所有请求都会落到某个固定的数据块中,最终导致分布在同一个分片中.在这种情况下,一小部分分片承载了集群大部分的数据,系统并不能很好地进行扩展。
3)与此相比,基于哈希的分片方式以范围查询性能的损失为代价,保证了集群中数据的均衡.哈希值的随机性使数据随机分布在每个数据块中,因此也随机分布在不同分片中.但是也正由于随机性,一个范围查询很难确定应该请求哪些分片,通常为了返回需要的结果,需要请求所有分片。