由于我公司服务器都是使用dell(服务器全部是dell),服务器型号有r710、r720,系统有centos 6.x 、12.04.4等版本,对于硬件监控,我测试过ipmi、megacli、smart等,但这些监控软件的监控内容都比较少,没有通用的那种,最后找到了dell专门的omsa,满足了我的需求,下面就介绍一下使用omsa来监控dell服务器的硬件信息。
目前我监控以下的硬件信息:
1、cpu处理器状态
2、cpu省电模式状态(如果开启了省电模式,在压力大的时候,会很卡的)
3、raid状态(比如做了哪个raid模式,raid状态是否正常)
4、内存状态(可以查看当前服务器最大支持多少内存,当前多少内存,如果内存有问题,可以显示哪个位置内存故障)
5、机器温度状态(监控机器的温度是否超过阀值)
6、物理硬盘状态(监控物理硬盘是否有故障)
7、电源状态(是单电还是双电,是否有故障)
8、系统面板CMOS电池(cmos电池是否有故障)
9、网卡状态(当前的网卡数量,以及网卡是否有问题)
10、风扇(当前的风扇数量,以及是否有故障)
默认是cpu省电模式监控关闭报警,其他的监控都是15分钟监控一次,如果连续2次都有问题则报警通知。
下面是安装方法:
一、客户端
A.在redhat或者centos系统里安装
1、安装dell的yum源
1 |
|
2、安装omsa
1 |
|
3、做软连接
1 2 |
|
4、关闭web模式(仅允许运行cli)
1 |
|
5、启动omsa
1 |
|
6、把omsa加入到开机启动里
1 |
|
zabbix客户端的配置
下面是在zabbix_agentd.conf里配置
UserParameter=hardware_battery,omreport chassis batteries|awk '/^Status/{if($NF=="Ok") {print 1} else {print 0}}'
UserParameter=hardware_cpu_model,awk -v hardware_cpu_crontol=`sudo omreport chassis biossetup|awk '/C State/{if($NF=="Enabled") {print 0} else {print 1}}'` -v hardware_cpu_c1=`sudo omreport chassis biossetup|awk '/C1[-|E]/{if($NF=="Enabled") {print 0} else {print 1}}'` 'BEGIN{if(hardware_cpu_crontol==0 && hardware_cpu_c1==0) {print 0} else {print 1}}'
UserParameter=hardware_fan_health,awk -v hardware_fan_number=`omreport chassis fans|grep -c "^Index"` -v hardware_fan=`omreport chassis fans|awk '/^Status/{if($NF=="Ok") count+=1}END{print count}'` 'BEGIN{if(hardware_fan_number==hardware_fan) {print 1} else {print 0}}'
UserParameter=hardware_memory_health,awk -v hardware_memory=`omreport chassis memory|awk '/^Health/{print $NF}'` 'BEGIN{if(hardware_memory=="Ok") {print 1} else {print 0}}'
UserParameter=hardware_nic_health,awk -v hardware_nic_number=`omreport chassis nics |grep -c "Interface Name"` -v hardware_nic=`omreport chassis nics |awk '/^Connection Status/{print $NF}'|wc -l` 'BEGIN{if(hardware_nic_number==hardware_nic) {print 1} else {print 0}}'
UserParameter=hardware_cpu,omreport chassis processors|awk '/^Health/{if($NF=="Ok") {print 1} else {print 0}}'
UserParameter=hardware_power_health,awk -v hardware_power_number=`omreport chassis pwrsupplies|grep -c "Index"` -v hardware_power=`omreport chassis pwrsupplies|awk '/^Status/{if($NF=="Ok") count+=1}END{print count}'` 'BEGIN{if(hardware_power_number==hardware_power) {print 1} else {print 0}}'
UserParameter=hardware_temp,omreport chassis temps|awk '/^Status/{if($NF=="Ok") {print 1} else {print 0}}'|head -n 1
UserParameter=hardware_physics_health,awk -v hardware_physics_disk_number=`omreport storage pdisk controller=0|grep -c "^ID"` -v hardware_physics_disk=`omreport storage pdisk controller=0|awk '/^Status/{if($NF=="Ok") count+=1}END{print count}'` 'BEGIN{if(hardware_physics_disk_number==hardware_physics_disk) {print 1} else {print 0}}'
UserParameter=hardware_virtual_health,awk -v hardware_virtual_disk_number=`omreport storage vdisk controller=0|grep -c "^ID"` -v hardware_virtual_disk=`omreport storage vdisk controller=0|awk '/^Status/{if($NF=="Ok") count+=1}END{print count}'` 'BEGIN{if(hardware_virtual_disk_number==hardware_virtual_disk) {print 1} else {print 0}}'
Zabbix-agent服务启动命令:
/etc/init.d/zabbix-agent start
开机自动启动命令:
chkconfig zabbix-agent on
二、服务端
1、模板导入
把Template Hardware Monitor导入到zabbix里,具体操作不介绍。
1 <?xml version="1.0" encoding="UTF-8"?>
2 <zabbix_export>
3 <version>2.0</version>
4 <date>2014-04-28T03:26:48Z</date>
5 <groups>
6 <group>
7 <name>Templates</name>
8 </group>
9 </groups>
10 <templates>
11 <template>
12 <template>Template Hardware Monitor</template>
13 <name>Template Hardware Monitor</name>
14 <groups>
15 <group>
16 <name>Templates</name>
17 </group>
18 </groups>
19 <applications>
20 <application>
21 <name>Hardware Monitor</name>
22 </application>
23 </applications>
24 <items>
25 <item>
26 <name>CPU处理器状态</name>
27 <type>0</type>
28 <snmp_community/>
29 <multiplier>0</multiplier>
30 <snmp_oid/>
31 <key>hardware_cpu</key>
32 <delay>300</delay>
33 <history>7</history>
34 <trends>365</trends>
35 <status>0</status>
36 <value_type>3</value_type>
37 <allowed_hosts/>
38 <units/>
39 <delta>0</delta>
40 <snmpv3_securityname/>
41 <snmpv3_securitylevel>0</snmpv3_securitylevel>
42 <snmpv3_authpassphrase/>
43 <snmpv3_privpassphrase/>
44 <formula>1</formula>
45 <delay_flex/>
46 <params/>
47 <ipmi_sensor/>
48 <data_type>0</data_type>
49 <authtype>0</authtype>
50 <username/>
51 <password/>
52 <publickey/>
53 <privatekey/>
54 <port/>
55 <description/>
56 <inventory_link>0</inventory_link>
57 <applications>
58 <application>
59 <name>Hardware Monitor</name>
60 </application>
61 </applications>
62 <valuemap/>
63 </item>
64 <item>
65 <name>CPU省电模式状态</name>
66 <type>0</type>
67 <snmp_community/>
68 <multiplier>0</multiplier>
69 <snmp_oid/>
70 <key>hardware_cpu_model</key>
71 <delay>300</delay>
72 <history>7</history>
73 <trends>365</trends>
74 <status>0</status>
75 <value_type>3</value_type>
76 <allowed_hosts/>
77 <units/>
78 <delta>0</delta>
79 <snmpv3_securityname/>
80 <snmpv3_securitylevel>0</snmpv3_securitylevel>
81 <snmpv3_authpassphrase/>
82 <snmpv3_privpassphrase/>
83 <formula>1</formula>
84 <delay_flex/>
85 <params/>
86 <ipmi_sensor/>
87 <data_type>0</data_type>
88 <authtype>0</authtype>
89 <username/>
90 <password/>
91 <publickey/>
92 <privatekey/>
93 <port/>
94 <description/>
95 <inventory_link>0</inventory_link>
96 <applications>
97 <application>
98 <name>Hardware Monitor</name>
99 </application>
100 </applications>
101 <valuemap/>
102 </item>
103 <item>
104 <name>Raid状态</name>
105 <type>0</type>
106 <snmp_community/>
107 <multiplier>0</multiplier>
108 <snmp_oid/>
109 <key>hardware_virtual_health</key>
110 <delay>300</delay>
111 <history>7</history>
112 <trends>365</trends>
113 <status>0</status>
114 <value_type>3</value_type>
115 <allowed_hosts/>
116 <units/>
117 <delta>0</delta>
118 <snmpv3_securityname/>
119 <snmpv3_securitylevel>0</snmpv3_securitylevel>
120 <snmpv3_authpassphrase/>
121 <snmpv3_privpassphrase/>
122 <formula>1</formula>
123 <delay_flex/>
124 <params/>
125 <ipmi_sensor/>
126 <data_type>0</data_type>
127 <authtype>0</authtype>
128 <username/>
129 <password/>
130 <publickey/>
131 <privatekey/>
132 <port/>
133 <description/>
134 <inventory_link>0</inventory_link>
135 <applications>
136 <application>
137 <name>Hardware Monitor</name>
138 </application>
139 </applications>
140 <valuemap/>
141 </item>
142 <item>
143 <name>内存状态</name>
144 <type>0</type>
145 <snmp_community/>
146 <multiplier>0</multiplier>
147 <snmp_oid/>
148 <key>hardware_memory_health</key>
149 <delay>300</delay>
150 <history>7</history>
151 <trends>365</trends>
152 <status>0</status>
153 <value_type>3</value_type>
154 <allowed_hosts/>
155 <units/>
156 <delta>0</delta>
157 <snmpv3_securityname/>
158 <snmpv3_securitylevel>0</snmpv3_securitylevel>
159 <snmpv3_authpassphrase/>
160 <snmpv3_privpassphrase/>
161 <formula>1</formula>
162 <delay_flex/>
163 <params/>
164 <ipmi_sensor/>
165 <data_type>0</data_type>
166 <authtype>0</authtype>
167 <username/>
168 <password/>
169 <publickey/>
170 <privatekey/>
171 <port/>
172 <description/>
173 <inventory_link>0</inventory_link>
174 <applications>
175 <application>
176 <name>Hardware Monitor</name>
177 </application>
178 </applications>
179 <valuemap/>
180 </item>
181 <item>
182 <name>机器温度状态</name>
183 <type>0</type>
184 <snmp_community/>
185 <multiplier>0</multiplier>
186 <snmp_oid/>
187 <key>hardware_temp</key>
188 <delay>300</delay>
189 <history>7</history>
190 <trends>365</trends>
191 <status>0</status>
192 <value_type>3</value_type>
193 <allowed_hosts/>
194 <units/>
195 <delta>0</delta>
196 <snmpv3_securityname/>
197 <snmpv3_securitylevel>0</snmpv3_securitylevel>
198 <snmpv3_authpassphrase/>
199 <snmpv3_privpassphrase/>
200 <formula>1</formula>
201 <delay_flex/>
202 <params/>
203 <ipmi_sensor/>
204 <data_type>0</data_type>
205 <authtype>0</authtype>
206 <username/>
207 <password/>
208 <publickey/>
209 <privatekey/>
210 <port/>
211 <description/>
212 <inventory_link>0</inventory_link>
213 <applications>
214 <application>
215 <name>Hardware Monitor</name>
216 </application>
217 </applications>
218 <valuemap/>
219 </item>
220 <item>
221 <name>物理硬盘状态</name>
222 <type>0</type>
223 <snmp_community/>
224 <multiplier>0</multiplier>
225 <snmp_oid/>
226 <key>hardware_physics_health</key>
227 <delay>300</delay>
228 <history>7</history>
229 <trends>365</trends>
230 <status>0</status>
231 <value_type>3</value_type>
232 <allowed_hosts/>
233 <units/>
234 <delta>0</delta>
235 <snmpv3_securityname/>
236 <snmpv3_securitylevel>0</snmpv3_securitylevel>
237 <snmpv3_authpassphrase/>
238 <snmpv3_privpassphrase/>
239 <formula>1</formula>
240 <delay_flex/>
241 <params/>
242 <ipmi_sensor/>
243 <data_type>0</data_type>
244 <authtype>0</authtype>
245 <username/>
246 <password/>
247 <publickey/>
248 <privatekey/>
249 <port/>
250 <description/>
251 <inventory_link>0</inventory_link>
252 <applications>
253 <application>
254 <name>Hardware Monitor</name>
255 </application>
256 </applications>
257 <valuemap/>
258 </item>
259 <item>
260 <name>电源状态</name>
261 <type>0</type>
262 <snmp_community/>
263 <multiplier>0</multiplier>
264 <snmp_oid/>
265 <key>hardware_power_health</key>
266 <delay>300</delay>
267 <history>7</history>
268 <trends>365</trends>
269 <status>0</status>
270 <value_type>3</value_type>
271 <allowed_hosts/>
272 <units/>
273 <delta>0</delta>
274 <snmpv3_securityname/>
275 <snmpv3_securitylevel>0</snmpv3_securitylevel>
276 <snmpv3_authpassphrase/>
277 <snmpv3_privpassphrase/>
278 <formula>1</formula>
279 <delay_flex/>
280 <params/>
281 <ipmi_sensor/>
282 <data_type>0</data_type>
283 <authtype>0</authtype>
284 <username/>
285 <password/>
286 <publickey/>
287 <privatekey/>
288 <port/>
289 <description/>
290 <inventory_link>0</inventory_link>
291 <applications>
292 <application>
293 <name>Hardware Monitor</name>
294 </application>
295 </applications>
296 <valuemap/>
297 </item>
298 <item>
299 <name>系统模板CMOS电池</name>
300 <type>0</type>
301 <snmp_community/>
302 <multiplier>0</multiplier>
303 <snmp_oid/>
304 <key>hardware_battery</key>
305 <delay>300</delay>
306 <history>7</history>
307 <trends>365</trends>
308 <status>0</status>
309 <value_type>3</value_type>
310 <allowed_hosts/>
311 <units/>
312 <delta>0</delta>
313 <snmpv3_securityname/>
314 <snmpv3_securitylevel>0</snmpv3_securitylevel>
315 <snmpv3_authpassphrase/>
316 <snmpv3_privpassphrase/>
317 <formula>1</formula>
318 <delay_flex/>
319 <params/>
320 <ipmi_sensor/>
321 <data_type>0</data_type>
322 <authtype>0</authtype>
323 <username/>
324 <password/>
325 <publickey/>
326 <privatekey/>
327 <port/>
328 <description/>
329 <inventory_link>0</inventory_link>
330 <applications>
331 <application>
332 <name>Hardware Monitor</name>
333 </application>
334 </applications>
335 <valuemap/>
336 </item>
337 <item>
338 <name>网卡状态</name>
339 <type>0</type>
340 <snmp_community/>
341 <multiplier>0</multiplier>
342 <snmp_oid/>
343 <key>hardware_nic_health</key>
344 <delay>300</delay>
345 <history>7</history>
346 <trends>365</trends>
347 <status>0</status>
348 <value_type>3</value_type>
349 <allowed_hosts/>
350 <units/>
351 <delta>0</delta>
352 <snmpv3_securityname/>
353 <snmpv3_securitylevel>0</snmpv3_securitylevel>
354 <snmpv3_authpassphrase/>
355 <snmpv3_privpassphrase/>
356 <formula>1</formula>
357 <delay_flex/>
358 <params/>
359 <ipmi_sensor/>
360 <data_type>0</data_type>
361 <authtype>0</authtype>
362 <username/>
363 <password/>
364 <publickey/>
365 <privatekey/>
366 <port/>
367 <description/>
368 <inventory_link>0</inventory_link>
369 <applications>
370 <application>
371 <name>Hardware Monitor</name>
372 </application>
373 </applications>
374 <valuemap/>
375 </item>
376 <item>
377 <name>风扇状态</name>
378 <type>0</type>
379 <snmp_community/>
380 <multiplier>0</multiplier>
381 <snmp_oid/>
382 <key>hardware_fan_health</key>
383 <delay>300</delay>
384 <history>7</history>
385 <trends>365</trends>
386 <status>0</status>
387 <value_type>3</value_type>
388 <allowed_hosts/>
389 <units/>
390 <delta>0</delta>
391 <snmpv3_securityname/>
392 <snmpv3_securitylevel>0</snmpv3_securitylevel>
393 <snmpv3_authpassphrase/>
394 <snmpv3_privpassphrase/>
395 <formula>1</formula>
396 <delay_flex/>
397 <params/>
398 <ipmi_sensor/>
399 <data_type>0</data_type>
400 <authtype>0</authtype>
401 <username/>
402 <password/>
403 <publickey/>
404 <privatekey/>
405 <port/>
406 <description/>
407 <inventory_link>0</inventory_link>
408 <applications>
409 <application>
410 <name>Hardware Monitor</name>
411 </application>
412 </applications>
413 <valuemap/>
414 </item>
415 </items>
416 <discovery_rules/>
417 <macros/>
418 <templates/>
419 <screens/>
420 </template>
421 </templates>
422 <triggers>
423 <trigger>
424 <expression>{Template Hardware Monitor:hardware_cpu.count(#2,0,"eq")}=2</expression>
425 <name>{}CPU处理器出现硬件故障!</name>
426 <url/>
427 <status>0</status>
428 <priority>4</priority>
429 <description/>
430 <type>0</type>
431 <dependencies/>
432 </trigger>
433 <trigger>
434 <expression>{Template Hardware Monitor:hardware_virtual_health.count(#2,0,"eq")}=2</expression>
435 <name>{}Raid出现硬件故障!</name>
436 <url/>
437 <status>0</status>
438 <priority>4</priority>
439 <description/>
440 <type>0</type>
441 <dependencies/>
442 </trigger>
443 <trigger>
444 <expression>{Template Hardware Monitor:hardware_memory_health.count(#2,0,"eq")}=2</expression>
445 <name>{}内存出现硬件故障!</name>
446 <url/>
447 <status>0</status>
448 <priority>4</priority>
449 <description/>
450 <type>0</type>
451 <dependencies/>
452 </trigger>
453 <trigger>
454 <expression>{Template Hardware Monitor:hardware_temp.count(#2,0,"eq")}=2</expression>
455 <name>{}机器温度超过阀,属于硬件故障!</name>
456 <url/>
457 <status>0</status>
458 <priority>4</priority>
459 <description/>
460 <type>0</type>
461 <dependencies/>
462 </trigger>
463 <trigger>
464 <expression>{Template Hardware Monitor:hardware_physics_health.count(#2,0,"eq")}=2</expression>
465 <name>{}物理硬盘出现硬件故障!</name>
466 <url/>
467 <status>0</status>
468 <priority>4</priority>
469 <description/>
470 <type>0</type>
471 <dependencies/>
472 </trigger>
473 <trigger>
474 <expression>{Template Hardware Monitor:hardware_power_health.count(#2,0,"eq")}=2</expression>
475 <name>{}电源出现硬件故障!</name>
476 <url/>
477 <status>0</status>
478 <priority>4</priority>
479 <description/>
480 <type>0</type>
481 <dependencies/>
482 </trigger>
483 <trigger>
484 <expression>{Template Hardware Monitor:hardware_battery.count(#2,0,"eq")}=2</expression>
485 <name>{}系统模板CMOS电池出现硬件故障!</name>
486 <url/>
487 <status>0</status>
488 <priority>4</priority>
489 <description/>
490 <type>0</type>
491 <dependencies/>
492 </trigger>
493 <trigger>
494 <expression>{Template Hardware Monitor:hardware_nic_health.count(#2,0,"eq")}=2</expression>
495 <name>{}网卡出现硬件故障!</name>
496 <url/>
497 <status>0</status>
498 <priority>4</priority>
499 <description/>
500 <type>0</type>
501 <dependencies/>
502 </trigger>
503 <trigger>
504 <expression>{Template Hardware Monitor:hardware_fan_health.count(#2,0,"eq")}=2</expression>
505 <name>{}风扇出现硬件故障!</name>
506 <url/>
507 <status>0</status>
508 <priority>4</priority>
509 <description/>
510 <type>0</type>
511 <dependencies/>
512 </trigger>
513 </triggers>
514 <graphs>
515 <graph>
516 <name>服务器硬件监控状态(0是不正常,1是正常)</name>
517 <width>900</width>
518 <height>200</height>
519 <yaxismin>0.0000</yaxismin>
520 <yaxismax>100.0000</yaxismax>
521 <show_work_period>1</show_work_period>
522 <show_triggers>1</show_triggers>
523 <type>0</type>
524 <show_legend>1</show_legend>
525 <show_3d>0</show_3d>
526 <percent_left>0.0000</percent_left>
527 <percent_right>0.0000</percent_right>
528 <ymin_type_1>0</ymin_type_1>
529 <ymax_type_1>0</ymax_type_1>
530 <ymin_item_1>0</ymin_item_1>
531 <ymax_item_1>0</ymax_item_1>
532 <graph_items>
533 <graph_item>
534 <sortorder>0</sortorder>
535 <drawtype>0</drawtype>
536 <color>C80000</color>
537 <yaxisside>0</yaxisside>
538 <calc_fnc>2</calc_fnc>
539 <type>0</type>
540 <item>
541 <host>Template Hardware Monitor</host>
542 <key>hardware_cpu</key>
543 </item>
544 </graph_item>
545 <graph_item>
546 <sortorder>1</sortorder>
547 <drawtype>0</drawtype>
548 <color>00C800</color>
549 <yaxisside>0</yaxisside>
550 <calc_fnc>2</calc_fnc>
551 <type>0</type>
552 <item>
553 <host>Template Hardware Monitor</host>
554 <key>hardware_cpu_model</key>
555 </item>
556 </graph_item>
557 <graph_item>
558 <sortorder>2</sortorder>
559 <drawtype>0</drawtype>
560 <color>0000C8</color>
561 <yaxisside>0</yaxisside>
562 <calc_fnc>2</calc_fnc>
563 <type>0</type>
564 <item>
565 <host>Template Hardware Monitor</host>
566 <key>hardware_virtual_health</key>
567 </item>
568 </graph_item>
569 <graph_item>
570 <sortorder>3</sortorder>
571 <drawtype>0</drawtype>
572 <color>C800C8</color>
573 <yaxisside>0</yaxisside>
574 <calc_fnc>2</calc_fnc>
575 <type>0</type>
576 <item>
577 <host>Template Hardware Monitor</host>
578 <key>hardware_memory_health</key>
579 </item>
580 </graph_item>
581 <graph_item>
582 <sortorder>4</sortorder>
583 <drawtype>0</drawtype>
584 <color>00C8C8</color>
585 <yaxisside>0</yaxisside>
586 <calc_fnc>2</calc_fnc>
587 <type>0</type>
588 <item>
589 <host>Template Hardware Monitor</host>
590 <key>hardware_temp</key>
591 </item>
592 </graph_item>
593 <graph_item>
594 <sortorder>5</sortorder>
595 <drawtype>0</drawtype>
596 <color>C8C800</color>
597 <yaxisside>0</yaxisside>
598 <calc_fnc>2</calc_fnc>
599 <type>0</type>
600 <item>
601 <host>Template Hardware Monitor</host>
602 <key>hardware_physics_health</key>
603 </item>
604 </graph_item>
605 <graph_item>
606 <sortorder>6</sortorder>
607 <drawtype>0</drawtype>
608 <color>C8C8C8</color>
609 <yaxisside>0</yaxisside>
610 <calc_fnc>2</calc_fnc>
611 <type>0</type>
612 <item>
613 <host>Template Hardware Monitor</host>
614 <key>hardware_power_health</key>
615 </item>
616 </graph_item>
617 <graph_item>
618 <sortorder>7</sortorder>
619 <drawtype>0</drawtype>
620 <color>960000</color>
621 <yaxisside>0</yaxisside>
622 <calc_fnc>2</calc_fnc>
623 <type>0</type>
624 <item>
625 <host>Template Hardware Monitor</host>
626 <key>hardware_battery</key>
627 </item>
628 </graph_item>
629 <graph_item>
630 <sortorder>8</sortorder>
631 <drawtype>0</drawtype>
632 <color>009600</color>
633 <yaxisside>0</yaxisside>
634 <calc_fnc>2</calc_fnc>
635 <type>0</type>
636 <item>
637 <host>Template Hardware Monitor</host>
638 <key>hardware_nic_health</key>
639 </item>
640 </graph_item>
641 <graph_item>
642 <sortorder>9</sortorder>
643 <drawtype>0</drawtype>
644 <color>000096</color>
645 <yaxisside>0</yaxisside>
646 <calc_fnc>2</calc_fnc>
647 <type>0</type>
648 <item>
649 <host>Template Hardware Monitor</host>
650 <key>hardware_fan_health</key>
651 </item>
652 </graph_item>
653 </graph_items>
654 </graph>
655 </graphs>
656 </zabbix_export>