February 1st 2012
@alexyork3d Hi Alex. It?s all working fine now. Drop a ticket in please detailing what you?re seeing. We can see you?re getting email.

February 1st 2012
We?ve had some issues with email delivery due to an attack. It has been mitigated. Some clients may notice slight email delays from earlier

February 1st 2012
@ByRonnie hi Ron, all replied to!

February 1st 2012
@alexyork3d no not just you!

You are here: Home » 3DPixelBlog

oxygen report

At 14.31 our nagios reporting systems alerted us that oxygen.3dpixelnet.com was not responding to http or email requests.

Unable to gain remote access (SSH was down) our engineer was dispatched to the datacentre and arrived at 14.58.

By 15.45 it was diagnosed that a memory module had failed. we replaced all the memory modules and by 16.00 the server was back online, but services were failing denying access to customers’ sites.

It was then a case of finding out why the services were not coming back online, and it was traced eventually to some obscure but critical libraries in both the /usr/lib and /lib directories that had bloated in size (due the memory corruption).

We restored individual library files from backups that had been taken a day earlier (19/08/08) as part of our standard backup procedure and chased down several bugs, as library files are heavily symlinked across multiple directories across the server. These files had to be downloaded to CD and carted across to oxygen.3dpixelnet.com manually as, with this corruption, SSH and rsync services were also down.

At 20.23 we rebooted the server for a final check and all services were online several minutes after that.

As a sidenote, this was our only server that did not use corsair memory. So folks, buy Corsair it’s not let us down yet :)

Leave a Reply

You must be logged in to post a comment.