When your OS updates break your CAM

As I’m sure many people have run into this before, and from personal experience, found nothing roaming the Interwebs on how to fix it, and seeings how I just fixed it, I’ll write up a little how to fix it post.

 

The set up:

Two SunFire X2100 M2 servers connected to a StorageTek 2530 via iscsi. The two nodes are running CentOS 6.2 with RedHat cluster software. I have a server running Nagios for monitoring and it checks for failed disks on the StorageTek by running a script on either node that returns the number of “optimal” and “failed” disks.

 

Two nodes connected to an array via iscsi (Ethernet)

The problem:

Updating system software is important. Keeping packages up to date protects from security vulnerabilities. Unfortunately, sometimes it breaks things. In this case, updating the suggested packages broke my set up, making it so that the Sun Storage CAM (Common Array Manager) software did not work anymore.

I first became alerted to this when Nagios sent me errors from the script checking the StorageTek disks. I checked the command that runs in the script to see what was up, and it returned several errors. Here they are for future googlers:

sscs list -i 192.168.128.101 device

returned “Command failed due to an exception. null” and

sscs list -a arrayname host

returned “arrayname : The resource was not found.”

Not particularly helpful messages.

Fortunately the nodes could still mount the array partitions, which allowed them to continue running as web and mysql servers. I just couldn’t run management commands on the array.

Since some of the required software for CAM was updated, I supposed that was causing the issue. The required software is listed below:

  • libXtst-1.0.99.2-3.el6.i686.rpm and its dependent rpm (InstallShield requirement)
  • libselinux-2.0.94-2.el6.i686.rpm
  • audit-libs-2.0.4-1.el6.i686.rpm
  • cracklib-2.8.16-2.el6.i686.rpm
  • db4-4.7.25-16.el6.i686.rpm
  • pam-1.1.1-4.el6.i686.rpm
  • libstdc++-4.4.4-13.el6.i686.rpm
  • zlib-1.2.3-25.el6.i686.rpm
  • ksh-20100621-2.el6.x86_64.rpm

 

The solution:

I couldn’t figure out on my own exactly what was wrong, so I contacted Oracle support, and they finally tipped me off to the solution. Completely remove the CAM software and reinstall it. Those steps are outlined below:

  • Go to the CAM software folder in /var/opt/CommonArrayManager/Host_Software_6.9.0.16/bin/and run
./uninstall -f
  • Here is a good spot to run yum update and restart the server if needed.
  • Change directories to where you have the CAM software install CD. There should be a folder called components in there. Change into that directory and install the jdk available there:
rpm -Uvh jdk-6u20-linux-i586.rpm
  • Next run the RunMe.bin file in the CAM Software CD folder.
./RunMe.bin -c
  • Install the RAID Proxy Agent package located in the Add_On/RaidArrayProxy directory of the latest CAM software distribution.
rpm -ivh SMruntime.xx.xx.xx.xx-xxxx.rpm
rpm -ivh SMagent-LINUX-xx.xx.xx.xx-xxxx.rpm
  • Register the array with the host/node. This process can take several minutes.
sscs register -d storage-system

One additional issue I ran into, was that some update or other process shutdown the NIC connecting the node to the array. I had to make sure that was running before I ran the register -d storage-system command above.

Share and Enjoy:
  • Print
  • PDF
  • RSS

Related Posts: