Tag Archives: monitoring

HP MicroServer G7 RAC and Linux

How to flash a new firmware and how to set the network remotely without rebooting.

Download the new firmware (version 1.4 as writing) from the HP website

The flashing utility SOCFLASH is available on the Aspeed website (direct link). The utility is available also for Linux both 32 and 64 bit.

Flash the new RAC firmware

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ ./socflash.sh all.1.4.bin old.1.3.bin
ASPEED SOC Flash Utility v.1.09.04
Find ASPEED Device 1a03:2000 on 3:0.0
Relacate IO Base: e800
MMIO Virtual Address: 3e18000
Static Memory Controller Information:
CS0 Flash Type is SPI
CS1 Flash Type is NOR
CS2 Flash Type is SPI
Boot CS is 2
Option Information:
CS: 2
Flash Type: SPI
[Warning] Don\'t AC OFF or Reboot System During BMC Firmware Update!!
[SOCFLASH] Flash ID : 180101
Find Flash Chip #1: SpansionS25FL128 SE64KB
Backup Flash Chip O.K.
Check Flash Chip #1 at: 440000

Setup IPMI in the OS

1
2
$ modprobe ipmi_si type=kcs ports=0xca2
$ echo "ipmi_si type=kcs ports=0xca2" > /etc/modprobe.d/ipmi.conf

Change the network settings

1
2
3
4
5
6
7
8
9
10
$ ipmitool shell
ipmitool> lan set 1 ipsrc static
ipmitool> lan set 1 ipaddr 172.16.12.238
Setting LAN IP Address to 172.16.12.238
ipmitool> lan set 1 netmask 255.255.255.0
Setting LAN Subnet Mask to 255.255.255.0
ipmitool> lan set 1 defgw ipaddr 172.16.12.10
Setting LAN Default Gateway IP to 172.16.12.10
ipmitool> mc reset warm
Sent warm reset command to MC

SNMP hell (part 1): proxy to multiple devices

Last week I started working on the @GEMwrld HPC cluster at the Eidgenössische Technische Hochschule Zürich (ETH). The first task was to implement a kind of system monitoring and resource profiling. We already have a centralized graphical web console, built upon Observium, that collect resources statistics from all of our servers; usually we use SNMP v2c and, just for a few cases, the munin-node daemon (more on this will be in the part 2).

On the ETH cluster we have full access to every node, but the only incoming connections allowed by the ETH firewall are the SSH sessions (on port 22), so there’s a first problem: how can we transport SNMP data from the agent to the poller?

The answer is easy: SSH tunnel! But: usually SNMP uses UDP protocol; making an UDP SSH tunnel is a bit painful. To workaround this issue we had simply used the TCP protocol for SNMP.

First of all you need an SSH tunnel. In this case the tunnel is made by the monitoring server:

ssh -N monitoring@serverip -L 16000:localhost:161

The 161 is the remote TCP port to be forwarded and 16000 is the local port.

To use SNMP on TCP you have to modify the net-snmp daemon (snmpd) command line parameters (and not the snmpd config file); just edit /etc/default/snmpd (on ubuntu/debian, for RHEL the path is /etc/sysconfig/snmpd) as following:

SNMPDOPTS='-LF 6 /var/log/snmpd.log -u snmp -g snmp -I -smux -p /var/run/snmpd.pid TCP:161'

The important part is ‘TCP:161’. This will bind snmpd on every interface and on the TCP port 161 (the default SNMP port).
On the poller side you will need to configure your monitoring system to poll localhost:16000 with tcp protocol. For Observium you can add the host with:

./addhost.php localhost  community v2c 16000 tcp

Now there’s another problem. The Zurich GEM cluster is made by eight nodes and I do not want to start an SSH tunnel on every node. Then my idea was to use a kind of proxy on the control node:

eth_snmp

net-snmp helped me with the snmpd proxy capabilities: http://www.net-snmp.org/wiki/index.php/Snmpd_proxy

Continue reading

CentOS 5 on KVM: reduce host CPU load

To reduce host CPU usage with a CentOS 5 VM on KVM is important to add

divider=10

to grub.conf as kernel parameter

kernel /vmlinuz-2.6.18-348.1.1.el5 ro root=LABEL=/ console=ttyS0,115200 divider=10

This will reduce the internal kernel timer from 1000 Hz to 100 Hz.

Although additional parameters are not required, the divider=10 parameter can still be used. Guests with this parameter will produce less CPU load in the host, but will use more coarse-grained timer expiration. (http://s19n.net/articles/2011/kvm_clock.html)

On MicroServer the CPU load reduce is quite visible:

MicroServer CPU usage

MicroServer CPU usage (made with http://www.observium.org/)

For more info read http://s19n.net/articles/2011/kvm_clock.html.

 

APCUPSD bug USB

Attenzione! Il post non è più aggiornato poiché il bug è stato risolto nella release 3.4.10 di apcupsd.

3.14.10 -- 13 September 2011         (Maintenance Release)
 
BUG FIXES
 
  * Fix missing status and spurrious incorrect status on newer BackUPS CS
    models using USB interface.
  [...]

APCUPSD è un comodissimo software per chi possiede un UPS della APC.
Esso permette di monitorare lo stato dell’UPS, inviare alert e email in caso di blackout, avviare lo spegnimento dei server collegati a seguito di un blackout prolungato, ecc…

Dall’ultima versione, la 3.14.8, è stato introdotto un bug che si viene a verificare nel caso di utilizzo del software in abbinamento ad un UPS APC BackUp CS collegato tramite USB: ogni qualvolta che il demone effettua il polling dell’unità viene generato un errore di controllo nel subsystem USB

[...]
USB disconnect, address 6
usb 3-2: ctrl urb status -62 received
[...]

causando tra l’altro la disconnessione e riconnessione della periferica HID (viene vista dal kernel come una HID). Continue reading