1. 修改配置文件
拷贝原来非集群的配置文件并进行修改如cluster_redis_6379.conf,其他配置可以保持不变,只要把以下三项配置项去掉注释#就行
cluster-enabled yes
cluster-config-file nodes-6379.conf //该文件会生成在数据持久化文件目录下
cluster-node-timeout 15000
2. 启动每个实例,至少要6个实例,我这边是两台主机,每台主机3个实例,主机A上的实例设计为主节点,主机B上的三个实例设计为备节点,启动命令跟正常服务进程启动命令一样如:./redis-server ../conf/cluster_redis_6379.conf
启动后ps察看进程:
[root@slave6zrf]# ps -ef |grep redis
root 7143 1 0Jul31 ? 00:00:46 ./redis-server0.0.0.0:6379 [cluster]
root 7160 1 0Jul31 ? 00:00:50 ./redis-server0.0.0.0:6380 [cluster]
root 7165 1 0Jul31 ? 00:00:46 ./redis-server0.0.0.0:6381 [cluster]
注意启动时:如果之前有配置到slaveof 配置项需要注释掉,另外如果之前已经有数据在实例中存在,需要每个实例执行flushdb命令清空。
3. 执行节点之间的互相握手以及槽的分配
这一步有两种方案,一种是手工进行握手和槽分配工作,另一种是redis提供了一个ruby脚本给我们帮我们自动完成握手和槽分配工作。
首先需要安装ruby环境
下载ruby https:// cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.1.tar.gz
安装ruby
tar xvf ruby-2.3.1.tar.gz
cd ruby-2.3.1
./configure -prefix=/usr/local/ruby
make
make install
cd /usr/local/ruby
cp bin/ruby /usr/local/bin
cp bin/gem /usr/local/bin
下载rubygem redis依赖http:// rubygems.org/downloads/redis-3.3.0.gem
安装依赖
gem install -lredis-3.3.0.gem
gem list --check redis gem
ruby环境安装完后,去redis的src目录下找到redis-trib.rb脚本,执行该脚本
./ redis-trib.rb
会打印出该脚本的使用说明书的话说明ruby环境安装没问题
然后执行以下命令:
./redis-trib.rb create --replicas 1132.121.127.31:6379 132.121.127.31:6381 132.121.127.32:6380 132.121.127.31:6380132.121.127.32:6379 132.121.127.32:6381
--这里注意两点(1)如果曾经分配一次失败,那么需要去持久化目录中把cluster-config-filenodes-6379.conf这个文件删掉,再重启6个实例以及重新执行redis-trib.rb脚本重新create,否则可能会出现某些槽分配失败,可以看输出的日志就能发现。
(2)虽然很多网上的经验都会说前一半的ip:port是自动被安排成master,后一半的ip:port被安排成slave,但是实际上这个安排并非是绝对这个顺序的,脚本上对主备节点的安排有随机性(可能同个主机6个节点的话是网上说的安排,但是如果是多台主机上多节点,这个顺序是混乱的)
成功创建集群时会打印出主备分配方案和槽点分配方案让你最终确认,输入yes即可。
4. 启动客户端
注意跟正常启动客户端的命令不同,要加入-c参数,表明你连接的是集群而非某个实例,如下:
./redis-cli -h 132.121.127.31 -p 6379 –c
执行
132.121.127.31:6379> cluster nodes
显式分配方案:
19df511fe6a7aa5f168a847326202c3c88xxxxxxf0132.121.127.31:6381@16381 master - 0 1501554999749 2 connected 10923-16383
d34623e2b78dcddf53aa994647ffbdxxxxxxxxxxb132.121.127.32:6381@16381 slave 19df511fe6a7aa5f168a847326202c3cxxxxx 01501555000751 6 connected
85c23ca8159e628e6c582202104461xxxxxxx0cec132.121.127.31:6380@16380 slave 6b0bb05a7b3602b75ba3922c1b356c859xxxxxxx32 01501554997745 4 connected
c0d76777exxxxxxxxxxxxxxxxxxxxxxxxxxxxxx8132.121.127.32:6379@16379 slave c10df7bb44bd62fa2e396897b0b44a3xxxxxxxxxx9 01501554997000 5 connected
6b0bb05a7b3602b75ba3922c1b356c8xxxxxxxx2132.121.127.32:6380@16380 master - 0 1501554999000 4 connected 5461-10922
c10df7bb44bd62fa2e396897b0b44xxxxxxxxxxxxx9132.121.127.31:6379@16379 myself,master - 0 1501554996000 1 connected 0-5460
--到此则整个集群搭建成功。
来看看使用吧:
132.121.127.31:6379>set key1111 value11111
->Redirected to slot [10696] located at 132.121.127.32:6380
OK
132.121.127.32:6380>set key2222 value22222
->Redirected to slot [3700] located at 132.121.127.31:6379
OK
132.121.127.31:6379>set key3333 value33333
->Redirected to slot [11488] located at 132.121.127.31:6381
OK
--你会看到客户端会根据key选择对应实例,并且都是master实例。
5.Java API的使用:
Set<HostAndPort> hostInfo = newHashSet<HostAndPort>();
HostAndPorthostAndPort1 = new HostAndPort("132.121.127.31",6379);
HostAndPorthostAndPort2 = new HostAndPort("132.121.127.31",6380);
HostAndPorthostAndPort3 = new HostAndPort("132.121.127.31",6381);
HostAndPorthostAndPort4 = new HostAndPort("132.121.127.32",6379);
HostAndPorthostAndPort5 = new HostAndPort("132.121.127.32",6380);
HostAndPorthostAndPort6 = new HostAndPort("132.121.127.32",6381);
hostInfo.add(hostAndPort1);
hostInfo.add(hostAndPort2);
hostInfo.add(hostAndPort3);
hostInfo.add(hostAndPort4);
hostInfo.add(hostAndPort5);
hostInfo.add(hostAndPort6);
JedisClustercluster = new JedisCluster(hostInfo,new JedisPoolConfig());
for(inti=0;i<100;i++){
cluster.set("key"+i,"value"+i);
}
try{
cluster.close();
}catch (IOException e) {
//TODO Auto-generated catch block
e.printStackTrace();
}
--上面例子基本上是最简单的例子了,构造方法选择了使用到连接池的那个,如果不用池也是可以的。
6.集群自动生成的nodes-6379.conf文件
去持久化文件所在的目录下
[root@slave6 redisdb]# more nodes-6379.conf
c10df7bb44bd62fa2e396897b0b44xxxxxxxxx132.121.127.31:6379@16379 master - 0 1501491730353 1 connected 0-5460
85c23ca8159e628e6c582202104461xxxxxxxxx132.121.127.31:6380@16380 slave 6b0bb05a7b3602b75ba3922c1b356cxxxxxxxxx 01501491729000 4 connected
c0d76777eb398766ab8e2563a8xxxxxxxxx132.121.127.32:6379@16379 myself,slave c10df7bb44bd62fa2e396897b0b44xxxxxxxxx 01501491727000 5 connected
6b0bb05a7b3602b75ba3922c1b356cxxxxxxxxx132.121.127.32:6380@16380 master - 0 1501491727347 4 connected 5461-10922
d34623e2b78dcddf53aa994647ffbxxxxxxxxx132.121.127.32:6381@16381 slave 19df511fe6a7aa5f168a847326202xxxxxxxxx 01501491728000 6 connected
19df511fe6a7aa5f168a847326202xxxxxxxxx132.121.127.31:6381@16381 master - 0 1501491729350 2 connected 10923-16383
vars currentEpoch 6 lastVoteEpoch 0
--可见跟我们在客户端执行clusternodes显示的内容是一样的,有节点id信息,主备关系信息,槽分配的信息
7.集群的扩容和伸缩
(1)加入新的节点
添加master节点:拷贝一份配置文件,把文件中所有端口号直接替换成新增实例的端口号,其它配置项不变,启动,启动后这个是孤立的节点,跟集群没有任何关系
第二步要把这个孤立节点加入集群
在客户端执行命令cluster meet 132.121.127.31 6382或者执行ruby脚本命令:
redis-trib.rb add-node 132.121.127.31:6382132.121.127.31:6379 //前者为要加入的节点,后者为集群中任何一个旧节点
返回ok则加入成功
在客户端执行命令cluster nodes则看到虽然加入成功了,并且角色是master,但是集群还没给它分配槽,所以数据是不可能写入该节点的。如果遇到下面异常:
[ERR] Node 132.121.127.31:6382 is notempty. Either the node already knows other nodes (check with CLUSTER NODES) orcontains some key in database 0.
--说明之前分配失败已经产生node_xxxxx.conf文件,删掉重启再加入就行
添加从节点:同样拷贝配置文件,启动从节点进程,刚开始也是孤立节点
redis-trib.rb add-node --slave --master-idcebf26ee2809612be474c1016396f4f86xxxxxxxx 132.121.127.32:6385 132.121.127.31:6379
执行成功:
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node132.121.127.32:6382 to make it join the cluster.
Waiting for the cluster to join.
>>> Configure node as replica of132.121.127.31:6382.
[OK] New node added correctly.
(2)下一步就是给该节点分配槽并迁移数据
使用ruby脚本:
redis-trib.rb reshard 132.121.127.31:6382 //132.121.127.31:6382为新增未分配槽的主节点
How many slots do you want to move (from 1to 16384)? 一般设置为16384/master节点数
What is the receiving node ID? 输入132.121.127.31:6382节点的nodeid
Type 'all' to use all the nodes as sourcenodes for the hash slots.
Type'done' once you entered all the source nodes IDs.
Source node# 输入all则表示所有节点都把其中部分槽分出来给新master节点
Do you want to proceed with the proposedreshard plan (yes/no)?输入yes确定重分配方案
--注意:为了尽量少得进行数据迁移的量,槽点并非是从头开始打乱分配,那样涉及到数据迁移量会非常大,而是采取了每个旧master节点都从原有的槽中截取一小段给新master,那么数据迁移量就是新master槽点范围涉及到的key.如下所示:
cebf26ee2809612be474c1016396f4f868xxxxxxxx132.121.127.31:6382@16382 master - 0 1501574076072 8 connected 0-332 5461-579410923-11255
--可以看到新master的槽段不是正常连续的,而是由三小段组成
8.redis cluster API是如何管理连接池的
从5可以看到,创建redis cluster对象时选用了带连接池配置的构造方法:JedisCluster cluster = new JedisCluster(hostInfo,new JedisPoolConfig());,那么对于JedisCluster对象而言是如何使用和释放连接的呢?这是个疑问,稍微看了下源码,大概发现了流程:
从cluster.set("key"+i, "value"+i);开始
去JedisCluster源码会发现
public String set(final String key, finalString value) {
return new JedisClusterCommand<String>(connectionHandler,maxAttempts) {
@Override
public String execute(Jedis connection) {
return connection.set(key, value);
}
}.run(key);
}
--这里涉及到一个匿名内部类创建的对象,根据匿名类的语法可知道该类继承了JedisClusterCommand类并重写了excute方法
去JedisClusterCommand源码找到run方法
public T run(String key) {
if (key == null) {
throw new JedisClusterException(NO_DISPATCH_MESSAGE);
}
return runWithRetries(SafeEncoder.encode(key), this.maxAttempts, false,false);
}
--再去找JedisClusterCommand的runWithRetries方法源码
private T runWithRetries(byte[] key, int attempts, booleantryRandomNode, boolean asking) {
if (attempts <= 0) {
throw new JedisClusterMaxRedirectionsException("Too many Clusterredirections?");
}
Jedis connection = null;
try {
if (asking) {
// TODO: Pipeline asking with the original command to make it
// faster....
connection = askConnection.get();
connection.asking();
// if asking success, reset asking flag
asking = false;
} else {
if (tryRandomNode) {
connection =connectionHandler.getConnection();
} else {
connection =connectionHandler.getConnectionFromSlot(JedisClusterCRC16.getSlot(key));
}
}
return execute(connection);
}catch (JedisNoReachableClusterNodeException jnrcne) {
throw jnrcne;
}catch (JedisConnectionException jce) {
// release current connection before recursion
releaseConnection(connection);
connection = null;
if (attempts <= 1) {
//We need this because if node is not reachable anymore - we need tofinally initiate slots renewing,
//or we can stuck with cluster state without one node in opposite case.
//But now if maxAttempts = 1 or 2 we will do it too often. For eachtime-outed request.
//TODO make tracking of successful/unsuccessful operations for node - dorenewing only
//if there were no successful responses from this node last few seconds
this.connectionHandler.renewSlotCache();
//no more redirections left, throw original exception, notJedisClusterMaxRedirectionsException, because it's not MOVED situation
throw jce;
}
return runWithRetries(key, attempts - 1, tryRandomNode, asking);
}catch (JedisRedirectionException jre) {
// if MOVED redirection occurred,
if (jre instanceof JedisMovedDataException) {
// it rebuilds cluster's slot cache
// recommended by Redis cluster specification
this.connectionHandler.renewSlotCache(connection);
}
// release current connection before recursion or renewing
releaseConnection(connection);
connection = null;
if (jre instanceof JedisAskDataException) {
asking = true;
askConnection.set(this.connectionHandler.getConnectionFromNode(jre.getTargetNode()));
} else if (jre instanceof JedisMovedDataException) {
} else {
throw new JedisClusterException(jre);
}
return runWithRetries(key, attempts - 1, false, asking);
}finally {
releaseConnection(connection);
}
}
--该段代码比较长,我们只需要看红色那两行获取jedis和释放jedis
Jedis connection = null;
connection =connectionHandler.getConnection();
releaseConnection(connection);
--connectionHandler定义代码private JedisClusterConnectionHandlerconnectionHandler;而JedisClusterConnectionHandler是个抽象类,真正使用时传入的是它的子类JedisSlotBasedConnectionHandler对象,察看JedisSlotBasedConnectionHandler的getConnection方法:
publicJedis getConnection() {
// In antirez's redis-rb-clusterimplementation,
// getRandomConnection always return validconnection (able to
// ping-pong)
// or exception if all connections areinvalid
List<JedisPool> pools =cache.getShuffledNodesPool();
for (JedisPool pool : pools) {
Jedis jedis = null;
try {
jedis =pool.getResource();
if (jedis == null) {
continue;
}
String result = jedis.ping();
if(result.equalsIgnoreCase("pong")) return jedis;
jedis.close();
} catch (JedisException ex) {
if (jedis != null) {
jedis.close();
}
}
}
throw new JedisNoReachableClusterNodeException("Noreachable node in cluster");
}
--注意红色部分jedis = pool.getResource();到这里就能看清楚了JedisCluster是从pool中借用连接对象的。
再看看releaseConnection(connection);归还连接对象的逻辑:
private void releaseConnection(Jedis connection) {
if (connection != null) {
connection.close();
}
}
--其实就是Jedis.close()
public void close() {
if (dataSource != null) {
if (client.isBroken()) {
this.dataSource.returnBrokenResource(this);
} else {
this.dataSource.returnResource(this);
}
}else {
client.close();
}
}
--如果dataSource被池对象赋值,那么就会调用池的归还操作returnBrokenResource
总结:从上面流程会发现每次对集群的读写操作都会重新从池中获取连接对象,操作完自动释放连接回连接池中,这真是个低效的操作….不过好像也没办法,因为每次读写操作的key都会映射到不同实例上,所以上次从池中获取的连接已经不能用了
9. rediscluster集群模式的一些知识点
(1)redis cluster是个无中心集群,所以当使用客户端set数据时优先发送给上次访问的实例,如果hash key后发现不满足该实例对应的槽段,则该实例会返回正确的实例信息给客户端,客户端进行Redirected操作,把该key的数据发送给正确的实例。
(2)关于redis cluster高可用性,对于redis cluster模式,虽然没有见到明显的哨兵监控进程,但是实际上每个主节点都承担了一个哨兵的功能,所以主节点组成的集群就相当与哨兵集群,跟哨兵集群类似,一个主节点挂掉了,需要所有主节点(包括挂掉那个)的一半以上数量主节点的同意才能正式进行主备切换,切换后备节点转主节点,承担主节点的所有功能,也包括哨兵功能。如果旧的主节点重新启动,那么它将作为新的主节点的从节点。
(3)由于主备切换需要耗时几十秒,所以在此期间客户端访问该主节点时会出现异常报错,redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException:Too many Cluster redirections?所以客户端代码中可以考虑到这点捕捉JedisClusterMaxRedirectionsException异常并等待几十秒后再重试可以解决该问题。
(4)个人觉得集群模式的仲裁机制可以进一步改进,把从节点加入仲裁节点队列,而且把阀值改成只要一半的仲裁节点同意就能进行主备切换。
10. 补充节点迁移知识点
(1)对于节点增加和减少,槽的重新分布等操作都可以用redis-trib.rb脚本完成。
新增节点上述内容有详细介绍,减少节点(节点下线)也有点折腾
如果该节点还没分配槽段,那么直接执行删除命令即可
./redis-trib.rb del-node132.121.127.31:6382 cebf26ee2809612be474c1016396f4f8xxxxxxx
但是这种情况不大可能实现,所以我们首先要做的是把要删除的节点的槽段转移到其他主节点上,比如
./redis-trib.rb reshard --from cebf26ee2809612be474c1016396f4f868xxxxx--to 19df511fe6a7aa5f168a847326202c3c882xxxxx --slots 4096 132.121.127.31:6379
--转移后cebf26ee2809612be474c1016396f4f868xxxxx节点的槽段为空,然后再执行删除命令即可。
(2)由于在使用过程中槽段可能会被分配的比较混乱,如果要重新平衡各个主节点的槽段(每个主节点的槽段长度都均匀),redis-trib.rb 提供了rebalance选项进行重新平衡。
(3)客户端上执行cluster nodes命令会出现一行带有myself的一个主节点,和其他主节点不同
85c23ca8159e628e6c5822021044617f234xxxx132.121.127.31:6380@16380 myself,master - 01501661758000 10 connected 6827-10922
--一直有疑问这个会不会是类似cassandra那样seed节点之类的特殊节点,后来发现不是,这个只不过表明132.121.127.31:6380是目前客户端连接的实例,所以说redis cluster并不存在特殊主节点。