最近在研究爬虫,需要在前面部署IP代理池,于是在开源中国找到proxy pool。可以自动抓取国内几个免费IP代理网站的IP,并实时校验IP的可用性,数据库为SSDB。 IP代理池网站: http://www.data5u.com/ http://www.data5u.com/free/
http://www.data5u.com/free/gngn/index.shtml
http://www.data5u.com/free/gnpt/index.shtml http://www.66ip.cn/ http://www.ip181.com/ http://www.xicidaili.com/nn http://www.xicidaili.com/nt http://www.goubanjia.com/free/gngn/index.shtml http://www.xdaili.cn/ipagent/freeip/getFreeIps?page=1&rows=10

2 yum -y install git 3 yum –y install wget 4 yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel 5 yum install gcc perl-ExtUtils-MakeMaker epel-release gcc-c++ 8 cd /usr/src/ 16 wget https://www.kernel.org/pub/software/scm/git/git-2.9.5.tar.gz #安装克隆工具git 17 tar -xzf git-2.9.5.tar.gz
18 cd git-2.9.5 19 make prefix=/usr/local/git all 20 make prefix=/usr/local/git install 21 echo "export PATH=$PATH:/usr/local/git/bin" >> /etc/bashrc 22 source /etc/bashrc 23 cd .. 24 git clone https://github.com/jhao104/proxy_pool.git #克隆proxy_pool 25 cd proxy_pool/ 26 python –V #查看python版本2.7.5 27 yum -y install python34 #安装python 3.4 28 wget --no-check-certificate https://bootstrap.pypa.io/get-pip.py 29 python3 get-pip.py #安装pip 30 pip install -r requirements.txt #proxy_pool的安装依赖包 32 cd /usr/local/ 33 git clone https://github.com/ideawu/ssdb.git #克隆SSDB 34 cd ssdb 35 yum -y install autoconf 37 cd deps/snappy-1.1.0/ #编译Snappy 38 ./configure 39 make 40 cd /usr/local/ssdb 41 make #安装SSDB 42 make install 43 ln -sf /usr/local/ssdb/ssdb-server /usr/local/bin/ssdb-server 44 ln -sf /usr/local/ssdb/tools/ssdb-cli /usr/local/bin/ssdb-cli 45 ln -sf /usr/local/ssdb/tools/ssdb-dump /usr/local/bin/ssdb-dump 46 ln -sf /usr/local/ssdb/tools/ssdb-repair /usr/local/bin/ssdb-repair 47 ln -sf /usr/local/ssdb/tools/ssdb.sh /etc/rc.d/init.d/ssdb 48 chkconfig --add ssdb 49 chkconfig ssdb on 50 systemctl stop firewalld.service #关闭防火墙 51 systemctl disable firewalld.service 52 firewall-cmd --state 53 pip install --upgrade pyssdb
启动proxy_pool顺序 1 cd /usr/local/ssdb 2 ./ssdb-server -d ./ssdb.conf -s start 注意:ssdb服务在启动的时候会在 var 目录下生成一个ssdb.pid文件,当ssdb由于意外关闭的时候这个文件不会被删除,因此当重新启动ssdb服务的时候,会报误,因此运行下面2行命令可以手动重启。 ./ssdb-server ssdb.conf -s stop ./ssdb-server -d ./ssdb.conf -s restart

3 cd /usr/src/proxy_pool/Run 4 python3.4 main.py #用python3.4启动

客户端访问 http://ip:5010/ http://ip:5010/get (随机调取一个IP及port) http://ip:5010/get_all (调取所有可用IP及port)

具体使用请参考https://github.com/jhao104/proxy_pool 本文提供cenots7下的部署详情,在此感谢contributor和j_hao104的无私奉献!