大坑:若实例化 JedisShardInfo 时不设置节点名称(name属性),那么当Redis节点列表的顺序发生变化时,会发生“键 rehash 现象”
使用BTrace追踪redis.clients.util.Sharded的实时状态,验证“Jedis分片机制的一致性哈希算法”实现;
发现一个致命坑:若JedisShardInfo不设置节点名称(name属性),那么当Redis节点列表的顺序发生变化时,会发生“键 rehash 现象”。见Sharded的initialize(...)方法实现:
(I) this.algo.hash("SHARD-" + i + "-NODE-" + n)
大坑:将节点的顺序索引i作为hash的一部分! 当节点顺序被无意识地调整了,会触发”键 rehash 现象”,那就杯具啦!("因节点顺序调整而引发rehash"的问题)
(II) this.algo.hash(shardInfo.getName() + "*" + shardInfo.getWeight() + n)
【优点】 这样设计避免了上面"因节点顺序调整而引发rehash"的问题。
【缺点】 坑:"节点名称+权重"必须是唯一的,否则节点会出现重叠覆盖! 同时,"节点名称+权重"必须不能被中途改变!
(III) 节点IP:端口号+编号
Memcached Java Client,就是采用这种策略。
【缺点】 因机房迁移等原因,可能导致节点IP发生改变!
(IIII) 唯一节点名称+编号
较好地一致性hash策略是:唯一节点名称+编号,不要考虑权重因素!
long hash = algo.hash(shardInfo.getName() + "*" + n)
所以,在配置Redis服务列表时,必须要设置节点逻辑名称(name属性)。
redis.server.list=192.168.6.35:6379:Shard-01,192.168.6.36:6379:Shard-02,192.168.6.37:6379:Shard-03,192.168.6.38:6379:Shard-04
相关代码如下所示:
1. public class Sharded<R, S extends ShardInfo<R>> {
2.
3. 1;
4. private TreeMap<Long, S> nodes;
5. private final Hashing algo;
6. private final Map<ShardInfo<R>, R> resources = new LinkedHashMap<ShardInfo<R>, R>();
7.
8. public Sharded(List<S> shards) {
9. 64-bits not 128
10. }
11.
12. public Sharded(List<S> shards, Hashing algo) {
13. this.algo = algo;
14. initialize(shards);
15. }
16.
17. private void initialize(List<S> shards) {
18. nodes = new TreeMap<Long, S>();
19.
20. 0; i != shards.size(); ++i) {
21. final S shardInfo = shards.get(i);
22. 0; n < 160
23. "SHARD-" + i + "-NODE-"
24. }
25. 0; n < 160
26. "*"
27. }
28. resources.put(shardInfo, shardInfo.createResource());
29. }
30. }
31.
32. ...
33.
34. }
Redis客户端连接数一直降不下来"的问题
这个问题发生有两方面的原因:
- 未正确使用对象池的空闲队列行为(LIFO“后进先出”栈方式)
- “关闭集群链接时异常导致连接泄漏”问题(见本文的第一个问题)
具体分析过程,详见《[线上问题] "Redis客户端连接数一直降不下来"的问题解决》。
2. Jedis “Socket读取超时”导致“返回值类型错误”
异常信息如下所示:
1. [2015-02-07 09:17:47] WARN c.f.f.b.s.r.i.CustomShardedJedisFactory -quit jedis connection for server fail: xxx.xxx.xxx.xxx:xxx
2. java.lang.ClassCastException: java.lang.Long cannot be cast to [B (强制类型转换异常)
3. 181) ~[jedis-2.6.2.jar:na]
4. 136) ~[jedis-2.6.2.jar:na]
5. 116) ~[forseti-biz-service-1.0-SNAPSHOT.jar:na]
6. 848) [commons-pool2-2.0.jar:2.0]
7. 626) [commons-pool2-2.0.jar:2.0]
8. 83) [jedis-2.6.2.jar:na]
9. 121) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
10. 337) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
11. 319) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
12. ...
13. [2015-02-07 09:17:47] ERROR c.f.f.b.s.r.i.RedisServiceImpl -'zadd'
14.
15. [2015-02-07 09:17:47] ERROR c.f.f.b.s.r.i.RedisServiceImpl -.SocketTimeoutException: Read timed out
16.
17. redis.clients.jedis.exceptions.JedisConnectionException: .SocketTimeoutException: Read timed out (Socket读取超时异常)
18. 201) ~[jedis-2.6.2.jar:na] ('limit = in.read(buf);' at java.io.InputStream.read(InputStream.java:100) - 这里出现阻塞导致"Socket读取超时"!)
19. 40) ~[jedis-2.6.2.jar:na]
20. 128) ~[jedis-2.6.2.jar:na]
21. 192) ~[jedis-2.6.2.jar:na]
22. 282) ~[jedis-2.6.2.jar:na]
23. 207) ~[jedis-2.6.2.jar:na]
24. 1293) ~[jedis-2.6.2.jar:na]
25. 364) ~[jedis-2.6.2.jar:na]
26. 328) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
27. 319) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
28. ...
29. Caused by: .SocketTimeoutException: Read timed out
30. 1.7.0_51]
31. 152) ~[na:1.7.0_51]
32. 122) ~[na:1.7.0_51]
33. 108) ~[na:1.7.0_51]
34. 195) ~[jedis-2.6.2.jar:na]
35. 38
从异常信息来看,首先是在'zadd'操作时出现"Socket读取超时异常",具体异常信息"JedisConnectionException: .SocketTimeoutException: Read timed out"。
强制类型转换异常",具体异常信息"ClassCastException: java.lang.Long cannot be cast to [B"。
这个问题已经有前辈遇到过了,其解释:
当发生异常的时候,这个buffer里还残存着上次没有发送或者发送不完整的命令。这个时候没有做处理,直接将该连接返回到连接池,那么重用该连接执行下次命令的时候,就会将上次没有发送的命令一起发送过去,所以才会出现上面的错误“返回值类型不对”。
所以,正确的写法应该是:在发送异常的时候,销毁这个连接,不能再重用!
参考自:
客户端连接泄露”问题
异常信息如下所示:
1. [2015-01-28 15:33:51] ERROR c.f.f.b.s.r.i.RedisServiceImpl -ShardedJedis close fail
2.
3. redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
4. at redis.clients.util.Pool.returnBrokenResourceObject(Pool.java:85) ~[jedis-2.6.2.jar:na]
5. at cn.fraudmetrix.forseti.biz.service.redis.impl.CustomShardedJedisPool.returnBrokenResource(CustomShardedJedisPool.java:120) ~[forseti-biz-service-1.0-SNAPSHOT.jar:na]
6. at cn.fraudmetrix.forseti.biz.service.redis.impl.CustomShardedJedisPool.returnBrokenResource(CustomShardedJedisPool.java:26) ~[forseti-biz-service-1.0-SNAPSHOT.jar:na]
7. at redis.clients.jedis.ShardedJedis.close(ShardedJedis.java:638) ~[jedis-2.6.2.jar:na]
8. at cn.fraudmetrix.forseti.biz.service.redis.impl.RedisServiceImpl.close(RedisServiceImpl.java:90) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
9. at cn.fraudmetrix.forseti.biz.service.redis.impl.RedisServiceImpl.zadd(RedisServiceImpl.java:380) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
10. at cn.fraudmetrix.forseti.biz.service.redis.impl.RedisServiceImpl.zadd(RedisServiceImpl.java:346) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
11. ...
12. at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_51]
13. at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_51]
14. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
15. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
16. at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
17. Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to [B
18. at redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:181) ~[jedis-2.6.2.jar:na]
19. at redis.clients.jedis.BinaryJedis.quit(BinaryJedis.java:136) ~[jedis-2.6.2.jar:na]
20. at redis.clients.jedis.BinaryShardedJedis.disconnect(BinaryShardedJedis.java:35) ~[jedis-2.6.2.jar:na]
21. at cn.fraudmetrix.forseti.biz.service.redis.impl.CustomShardedJedisFactory.destroyObject(CustomShardedJedisFactory.java:106) ~[forseti-biz-service-1.0-SNAPSHOT.jar:na]
22. at org.apache.commons.pool2.impl.GenericObjectPool.destroy(GenericObjectPool.java:848) ~[commons-pool2-2.0.jar:2.0]
23. at org.apache.commons.pool2.impl.GenericObjectPool.invalidateObject(GenericObjectPool.java:626) ~[commons-pool2-2.0.jar:2.0]
24. at redis.clients.util.Pool.returnBrokenResourceObject(Pool.java:83) ~[jedis-2.6.2.jar:na]
25. ... 37
应用程序无法捕获运行时的强制类型转换异常(“java.lang.ClassCastException: java.lang.Long cannot be cast to [B”)导致关闭操作异常中断,问题的根源代码位于“BinaryShardedJedis.disconnect(BinaryShardedJedis.java:35)
CustomShardedJedisFactory.destroyObject(CustomShardedJedisFactory.java:106)”。
只捕获了 JedisConnectionException 异常,如下所示:
1. public void destroyObject(PooledObject<ShardedJedis> pooledShardedJedis) throws Exception {
2. final ShardedJedis shardedJedis = pooledShardedJedis.getObject();
3.
4. "链接资源"无法被释放,存在泄露
5. }
1. public void disconnect() {
2. for (Jedis jedis : getAllShards()) {
3. try {
4. jedis.quit();
5. } catch (JedisConnectionException e) {
6. // ignore the exception node, so that all other normal nodes can release all connections.
7. }
8. try {
9. jedis.disconnect();
10. } catch (JedisConnectionException e) {
11. // ignore the exception node, so that all other normal nodes can release all connections.
12. }
13. }
14. }
代码捕获了所有的 Exception,就不存在释放链接时由于异常未捕获而导致链接释放中断。
如下所示:
1. public void destroyObject(PooledObject<ShardedJedis> pooledShardedJedis) throws Exception {
2. final ShardedJedis shardedJedis = pooledShardedJedis.getObject();
3.
4. "链接资源"无法被释放,存在泄露
5. for (Jedis jedis : shardedJedis.getAllShards()) {
6. try {
7. 1. 请求服务端关闭连接
8. jedis.quit();
9. } catch (Exception e) {
10. // ignore the exception node, so that all other normal nodes can release all connections.
11.
12. // java.lang.ClassCastException: java.lang.Long cannot be cast to [B
13. // (zadd/zcard 返回 long 类型,而 quit 返回 string 类型。从这里看,上一次的请求结果并未读取)
14. "quit jedis connection for server fail: "
15. }
16.
17. try {
18. 2. 客户端主动关闭连接
19. jedis.disconnect();
20. } catch (Exception e) {
21. // ignore the exception node, so that all other normal nodes can release all connections.
22.
23. "disconnect jedis connection fail: "
24. }
25. }