1、日志简介
nginx日志主要有两种:访问日志和错误日志。访问日志主要记录客户端访问nginx的每一个请求,格式可以自定义;错误日志主要记录客户端访问nginx出错时的日志,格式不支持自定义。两种日志都可以选择性关闭。
通过访问日志,你可以得到用户地域来源、跳转来源、使用终端、某个URL访问量等相关信息;通过错误日志,你可以得到系统某个服务或server的性能瓶颈等。因此,将日志好好利用,你可以得到很多有价值的信息。
2、访问日志
[Access.log]
log_format main '$remote_addr $remote_user [$time_local] "$request" $http_host '
'$status $upstream_status $body_bytes_sent "$http_referer" '
'"$http_user_agent" $ssl_protocol $ssl_cipher $upstream_addr '
'$request_time $upstream_response_time';
变量名称
变量描述
举例说明
$remote_addr
客户端地址
113.140.15.90
$remote_user
客户端用户名称
-
$time_local
访问时间和时区
18/Jul/2012:17:00:01 +0800
$request
请求的URI和HTTP协议
"GET /pa/img/home/logo-alipay-t.png HTTP/1.1"
$http_host
请求地址,即浏览器中你输入的地址(IP或域名)
img.alipay.com
10.253.70.103
$status
HTTP请求状态
200
$upstream_status
upstream状态
200
$body_bytes_sent
发送给客户端文件内容大小
547
$http_referer
跳转来源
"https://cashier.alipay.com.../"
$http_user_agent
用户终端代理
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; GTB7.0; .NET4.0C;
$ssl_protocol
SSL协议版本
TLSv1
$ssl_cipher
交换数据中的算法
RC4-SHA
$upstream_addr
后台upstream的地址,即真正提供服务的主机地址
10.228.35.247:80
$request_time
整个请求的总时间
0.205
$upstream_response_time
请求过程中,upstream响应时间
0.002
线上实例:
116.9.137.90 - [02/Aug/2012:14:47:12 +0800] "GET /p_w_picpaths/XX/20100324752729.png HTTP/1.1"img.alipay.com 200 200 2038 https://cashier.alipay.com/XX/PaymentResult.htm?payNo=XX&outBizNo=2012XX "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Tablet PC 2.0; 360SE)" TLSv1 AES128-SHA 10.228.21.237:80 0.198 0.001
线下测试($http_referer):
10.14.21.197 - - [14/Aug/2012:17:28:22 +0800] "GET /spanner/watch/v1?--db=ztg-1&--mode=compare&--index=status&--option=&--cluster=whole&-F=2012%2F8%2F12-00%3A00%3A00&-T=%2B2880&-i=1&-n=0&_=1344936501292 HTTP/1.1" 200 94193 "http://spanner.alipay.net/optionFrame/history.html" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.60 Safari/537.1"
clip_p_w_picpath001
备注:$http_referer和重定向有关。
线下测试($http_host):
clip_p_w_picpath002
备注:$http_host的值和你在浏览器里输入的值有关。
3、错误日志
错误信息
错误说明
"upstream prematurely(过早的) closed connection"
请求uri的时候出现的异常,是由于upstream还未返回应答给用户时用户断掉连接造成的,对系统没有影响,可以忽略
"recv() failed (104: Connection reset by peer)"
(1)服务器的并发连接数超过了其承载量,服务器会将其中一些连接Down掉;
(2)客户关掉了浏览器,而服务器还在给客户端发送数据;
(3)浏览器端按了Stop
"(111: Connection refused) while connecting to upstream"
用户在连接时,若遇到后端upstream挂掉或者不通,会收到该错误
"(111: Connection refused) while reading response header from upstream"
用户在连接成功后读取数据时,若遇到后端upstream挂掉或者不通,会收到该错误
"(111: Connection refused) while sending request to upstream"
Nginx和upstream连接成功后发送数据时,若遇到后端upstream挂掉或者不通,会收到该错误
"(110: Connection timed out) while connecting to upstream"
nginx连接后面的upstream时超时
"(110: Connection timed out) while reading upstream"
nginx读取来自upstream的响应时超时
"(110: Connection timed out) while reading response header from upstream"
nginx读取来自upstream的响应头时超时
"(110: Connection timed out) while reading upstream"
nginx读取来自upstream的响应时超时
"(104: Connection reset by peer) while connecting to upstream"
upstream发送了RST,将连接重置
"upstream sent invalid header while reading response header from upstream"
upstream发送的响应头无效
"upstream sent no valid HTTP/1.0 header while reading response header from upstream"
upstream发送的响应头无效
"client intended to send too large body"
用于设置允许接受的客户端请求内容的最大值,默认值是1M,client发送的body超过了设置值
"reopening logs"
用户发送kill -USR1命令
"gracefully shutting down",
用户发送kill -WINCH命令
"no servers are inside upstream"
upstream下未配置server
"no live upstreams while connecting to upstream"
upstream下的server全都挂了
"SSL_do_handshake() failed"
SSL握手失败
"SSL_write() failed (SSL:) while sending to client"
"(13: Permission denied) while reading upstream"
"(98: Address already in use) while connecting to upstream"
"(99: Cannot assign requested address) while connecting to upstream"
"ngx_slab_alloc() failed: no memory in SSL session shared cache"
ssl_session_cache大小不够等原因造成
"could not add new SSL session to the session cache while SSL handshaking"
ssl_session_cache大小不够等原因造成
"send() failed (111: Connection refused)"
问题是80的apache转发到8080的apache时失败:
[error] (99)Cannot assign requested address: proxy: HTTP: attempt to connect to 127.0.0.1:8080 (*) failed
netstat里time wait太多导致了[error] (99)Cannot assign requested address
sysctl -w net.ipv4.tcp_tw_recycle=1 表示开启TCP连接中TIME-WAIT sockets的快速回收
改了这个参数在观察:apache不报错了,time wait数也降了
net.ipv4.tcp_syncookies = 1
表示开启SYN Cookies。当出现SYN等待队列溢出时,启用cookies来处理,可防范少量SYN***,默认为0,表示关闭;
net.ipv4.tcp_tw_reuse = 1
表示开启重用。允许将TIME-WAIT sockets重新用于新的TCP连接,默认为0,表示关闭;
net.ipv4.tcp_tw_recycle = 1
表示开启TCP连接中TIME-WAIT sockets的快速回收,默认为0,表示关闭。
net.ipv4.tcp_fin_timeout = 30
表示如果套接字由本端要求关闭,这个参数决定了它保持在FIN-WAIT-2状态的时间。
net.ipv4.tcp_keepalive_time = 1200
表示当keepalive起用的时候,TCP发送keepalive消息的频度。缺省是2小时,改为20分钟。
net.ipv4.ip_local_port_range = 1024 ? ?65000
表示用于向外连接的端口范围。缺省情况下很小:32768到61000,改为1024到65000。
net.ipv4.tcp_max_syn_backlog = 8192
表示SYN队列的长度,默认为1024,加大队列长度为8192,可以容纳更多等待连接的网络连接数。
net.ipv4.tcp_max_tw_buckets = 5000
表示系统同时保持TIME_WAIT套接字的最大数量,如果超过这个数字,TIME_WAIT套接字将立刻被清除并打印警告信息。默认为180000,改为5000。对于Apache、Nginx等服务器,上几行的参数可以很好地减少TIME_WAIT套接字数量,但是对于Squid,效果却不大。此项参数可以控制TIME_WAIT套接字的最大数量,避免Squid服务器被大量的TIME_WAIT套接字拖死。..