上篇文章已经详细的介绍了replica set的搭建过程,这篇文章主要对故障的自动切换、节点的增、删、改进行介绍
http://1413570.blog.51cto.com/1403570/1337619 mongodb 的replica set的搭建过程
模拟示列一:
res1:PRIMARY> rs.conf();
{
"_id" : "res1",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "192.168.1.248:27017",
"priority" : 2
},
{
"_id" : 1,
"host" : "192.168.1.247:27018",
"priority" : 0
},
{
"_id" : 2,
"host" : "192.168.1.250:27019"
}
]
}
看出,primary 是host:192.168.1.248,因为priority 属性大,其次是 host:192.168.1.250,当host 192.168.1.248宕机时,就有host 192.168.1.250 作为primary ,主库
假设 host 192.168.1.248 停掉mongodb主进程
ps -ef | grep mongodb
kill 8665
尽量不要使用kill -9 这个可能会导致mongo数据文件的损坏
OK,现在其他两台server的日志已经提示
Fri Dec 6 16:36:10.522 [rsHealthPoll] couldn't connect to 192.168.1.248:27017: couldn't connect to server 192.168.1.248:27017
之后有host 192.168.20.250 来作为primary
Fri Dec 6 16:36:40.707 [conn248] end connection 192.168.1.250:46500 (1 connection now open)
Fri Dec 6 16:36:40.708 [initandlisten] connection accepted from 192.168.1.250:46592 #249 (2 connections now open)
Fri Dec 6 16:36:40.710 [conn249] authenticate db: local { authenticate: 1, nonce: "f70f5a8aea558178", user: "__system", key: "19fb73382ae940816c685b2561b0a76e" }
现在通过mongodb的shell ,登录
[root@anenjoy ~]# /usr/local/mongodb/bin/mongo --port 27019
MongoDB shell version: 2.4.8
connecting to: 127.0.0.1:27019/test
res1:PRIMARY>
就会显示primary
之后通过rs.ststus();
res1:PRIMARY> rs.status();
{
"set" : "res1",
"date" : ISODate("2013-12-06T08:44:01Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.1.248:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(1386118280, 1),
"optimeDate" : ISODate("2013-12-04T00:51:20Z"),
"lastHeartbeat" : ISODate("2013-12-06T08:44:00Z"),
"lastHeartbeatRecv" : ISODate("2013-12-06T08:41:32Z"),
"pingMs" : 0
},
{
"_id" : 1,
"name" : "192.168.1.247:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 3790,
"optime" : Timestamp(1386118280, 1),
"optimeDate" : ISODate("2013-12-04T00:51:20Z"),
"lastHeartbeat" : ISODate("2013-12-06T08:44:00Z"),
"lastHeartbeatRecv" : ISODate("2013-12-06T08:44:01Z"),
"pingMs" : 0,
"syncingTo" : "192.168.1.250:27019"
},
{
"_id" : 2,
"name" : "192.168.1.250:27019",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 4958,
"optime" : Timestamp(1386118280, 1),
"optimeDate" : ISODate("2013-12-04T00:51:20Z"),
"self" : true
}
],
"ok" : 1
}
res1:PRIMARY>
可以看到name 192.168.1.248 这台server的不正常,另外两台的LOG也是在不断的输出无法连接到host 192.168.1.248 27017 这个端口,
当你host 192.168.1.248 mongodb进程重新运行起来之后,就会自动切换为primary
Fri Dec 6 16:48:35.325 [conn246] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [192.168.1.247:27047]
Fri Dec 6 16:48:35.388 [rsHealthPoll] replSet member 192.168.1.248:27017 is now in state PRIMARY
[root@test02 bin]# /usr/local/mongodb/bin/mongo --port 27017
MongoDB shell version: 2.4.8
connecting to: 127.0.0.1:27017/test
res1:PRIMARY>
而如果你host 192.168.1.248宕机时,host 192.168.1.250 担当primary,进行写数据
db.appstore.save({'e_name':'xiaowang','e_id':1103,'class_id':2});
res1:PRIMARY> db.appstore.find();db.appstore.find();
{ "_id" : ObjectId("529e7c88d4d317e4bd3eece9"), "e_name" : "frank", "e_id" : 1101, "class_id" : 1 }
{ "_id" : ObjectId("52a18f3bd36b29b9c78be267"), "e_name" : "xiaowang", "e_id" : 1103, "class_id" : 2 }
之后当host 192.168.1.248 担当primary时,新增加的数据也会进行同步的,类似mysql的master-slave 同步
示列二:replica set 节点的增、删、改操作
现在呢,假设我primary host 192.168.1.248 宕机了,想把这个节点给删掉
先ps -aux | grep mongodb ,然后kill掉进程
现在 host 192.168.20.250 已经被置为primary
[root@anenjoy ~]# /usr/local/mongodb/bin/mongo --port 27019
MongoDB shell version: 2.4.8
connecting to: 127.0.0.1:27019/test
res1:PRIMARY>
通过rs.conf()查看节点配置
res1:PRIMARY> rs.conf();
{
"_id" : "res1",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "192.168.1.248:27017",
"priority" : 2
},
{
"_id" : 1,
"host" : "192.168.1.247:27018",
"priority" : 0
},
{
"_id" : 2,
"host" : "192.168.1.250:27019"
}
]
res1:PRIMARY> rs.remove('192.168.1.248:27017');
Fri Dec 6 16:59:01.480 DBClientCursor::init call() failed
Fri Dec 6 16:59:01.482 Error: error doing query: failed at src/mongo/shell/query.js:78
Fri Dec 6 16:59:01.482 trying reconnect to 127.0.0.1:27019
Fri Dec 6 16:59:01.482 reconnect 127.0.0.1:27019 ok
再次查看,ok 节点已经被删除掉了
res1:PRIMARY> rs.conf();
{
"_id" : "res1",
"version" : 2,
"members" : [
{
"_id" : 1,
"host" : "192.168.1.247:27018",
"priority" : 0
},
{
"_id" : 2,
"host" : "192.168.1.250:27019"
}
]
}
LOG日志中也就不会有:[rsHealthPoll] couldn't connect to 192.168.1.248:27017: couldn't connect to server 192.168.1.248:27017 日志的输出
增加节点:
通过oplog直接进行增加节点操作简单且不需要人过多的参与,但oplog是capped collection,会循环使用的,所以如果只是简单的使用oplog来进行增加节点,有可能导致数据的不一致,因为日志中存储的信息有可能已经刷新过了。
可以通过使用数据库快照(--fastsync)和oplog结合的方式来增加节点,一般的操作步骤是:
先取某一个复制集成员的物理文件作为初始化数据,然后剩余的部分用oplog日志来追加,从而最终达到数据一致性
最新准备的步骤都是一样的:
建DB存储的目录,key文件、权限的600
第一步:配置存储路径,--dbpath的参数
均放在/data/mon_db下,目录权限赋予mongodb用户
mkdir -p /data/mon_db
chown -R mongodb:mongodb /data/mon_db/
创建日志文件,--logpath的参数,位置自己定义
就放在mkdir -p /usr/local/mongodb/log
touch /usr/local/mongodb/log/mongodb.log
chown -R mongodb:mongodb /usr/local/mongodb/
第二步:创建主从的key文件,用于标识集群的私钥的完整路径,如果各个实例的key file内容不一致,程序将不能正常用
[root@test02 ~]# mkdir -p /data/mon_db/key
[root@test02 ~]# echo "this is res key" > /data/mon_db/key/res1
Chmod +R 600 /data/mon_db/key/res1 权限赋予600,否则会提示error message
Wed Dec 4 06:22:36.413 permissions on /data/mon_db/key/res1 are too open
更改不同的名字就好了
假设说同步host 192.168.1.247的物理文件吧
Scp -r /data/mongodb/res2/ root@ip:/data/mon_db/res4
之后呢,可以在primary插入新数据(验证使用)
启动mongodb
/usr/local/mongodb/bin/mongod --port 27020 --replSet res1 --keyFile /data/mon_db/key/res4 --oplogSize 100 --dbpath=/data/mon_db/res4/ --logpath=/usr/local/mongodb/log/mongodb.log --logappend --fastsync --fork
之后
在primary上执行添加节点:
Rs.add(‘192.168.1.x:27020’)
之后在新添加的节点上,登录到mongodb,获取读的权限,查看数据是不是同步成功
节点的更改
何为节点的更改,其实不外乎对节点host、port、priority进行更改,这边文章简单的描述下如何进行更改
目前我的replica set 有三个节点
/usr/local/mongodb/bin/mongo --port 27019
rs.status();
{
"set" : "res1",
"date" : ISODate("2013-12-06T11:56:42Z"),
"myState" : 1,
"members" : [
{
"_id" : 1,
"name" : "192.168.1.247:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 10661,
"optime" : Timestamp(1386330980, 1),
"optimeDate" : ISODate("2013-12-06T11:56:20Z"),
"lastHeartbeat" : ISODate("2013-12-06T11:56:42Z"),
"lastHeartbeatRecv" : ISODate("2013-12-06T11:56:40Z"),
"pingMs" : 0,
"syncingTo" : "192.168.1.250:27019"
},
{
"_id" : 2,
"name" : "192.168.1.250:27019",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 16519,
"optime" : Timestamp(1386330980, 1),
"optimeDate" : ISODate("2013-12-06T11:56:20Z"),
"self" : true
},
{
"_id" : 3,
"name" : "192.168.1.248:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 22,
"optime" : Timestamp(1386330980, 1),
"optimeDate" : ISODate("2013-12-06T11:56:20Z"),
"lastHeartbeat" : ISODate("2013-12-06T11:56:42Z"),
"lastHeartbeatRecv" : ISODate("2013-12-06T11:56:41Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "syncing to: 192.168.1.250:27019",
"syncingTo" : "192.168.1.250:27019"
}
],
"ok" : 1
}
我想更改节点直接的优先级,现在host 192.168.1.250 是primary ,priority为2 ,我想让host:192.168.1.248 作为primary,只要它的priority 为3 大于2 即可
res1:PRIMARY> cfg=rs.conf();
{
"_id" : "res1",
"version" : 3,
"members" : [
{
"_id" : 1,
"host" : "192.168.1.247:27018",
"priority" : 0
},
{
"_id" : 2,
"host" : "192.168.1.250:27019"
},
{
"_id" : 3,
"host" : "192.168.1.248:27017"
}
]
}
res1:PRIMARY>cfg.members[2].priority=3;
res1:PRIMARY> rs.reconfig(cfg);rs.reconfig() 类似重新初始化
Fri Dec 6 20:00:29.788 DBClientCursor::init call() failed
Fri Dec 6 20:00:29.792 trying reconnect to 127.0.0.1:27019
Fri Dec 6 20:00:29.793 reconnect 127.0.0.1:27019 ok
reconnected to server after rs command (which is normal)
多敲两次回车,就会发现之前是primary,就变成了secondary,而你的host 192.168.1.248 就变成了primary
转载于:https://blog.51cto.com/caibird/1337622