...And we're back! My apologies for the extended downtime. The Curse network servers suffered a catastrophic hardware failure that took us out. Our team has been working tirelessly over the last couple days and nights to get us back online. I'm sorry for the inconvenience this has caused, especially in the last week to push for Gladiator! Thanks for your patience on this one.
Also, we may experience some additional downtime in the next day as hardware is replaced.
But, what actually happened?
This is the current update from Curse what happened to cause this major network-wide crash and what we're doing to make sure it doesn't happen again.
Around 7:30 AM PST on Wednesday, June 22nd one of the storage array network (SAN) controller nodes in our Atlanta datacenter failed, causing all the sites on the Curse network to go offline. This is a highly redundant system with a backup controller which should have taken over automatically. However, it did not, despite reporting as healthy. After replacing the failed controller, it booted, and began copying its configuration from its peer server. Unfortunately, as soon as the configuration was copied, the secondary controller also died.
After replacing the 2nd failed controller we began powering the servers that relied on the SAN for their data - all of the database servers and the network-attached storage (NAS) file server storing media, static content, and most web files. When the NAS server booted and reconnected to its volume on the SAN, it began to run a checkdisk command to make sure there were no errors on the drive. This proved to be a drawn out process, and was the primary reason for the length of the downtime.
In addition, we had yet another roadblock with our Linux servers. Both of the new controller nodes for the SAN had a newer firmware version, preventing these databases from reading their storage. The vendor acknowledged this as a known issue, and recommended a firmware upgrade to fix it. In order to ensure our data integrity, we are conducting a full backup of this storage before implementing the firmware upgrade. Once complete, this will bring the Linux based databases online.
After the fix is applied, we will be able to restore the database to its pre-crash state and restore full functionality to all of our sites.
Your Personal Information
We can reassure you that at no time during the hardware failure was any of your personal information compromised. We take the sacred trust you put in us with your information VERY seriously.
We are currently working hard to get all of our sites restored and functioning normally. We hope to have everything up and working again by tomorrow.
We realize that you depend on Curse for the information and add-ons that enhance your gameplay experience. As a way of thanking you for your patience and loyalty through this downtime we are giving premium access to the Curse Client to all users starting on July 1st and running through July 5th. In addition, all guilds on Wowstead.com will receive free premium access - stay tuned on WowStead for more information.
Once again, we sincerely apologize for the downtime and hope you'll continue to enjoy all the great services Curse has to offer.
The Curse Team
To celebrate coming back online and the last weekend of Season 9, we'll be giving away some mounts and pets.
There are 2 easy ways to enter, comment on this thread and/or Like our Welcome Back message on Facebook. You can enter once on AJ and once on Facebook. I will select 4 winners from this thread and 3 from Facebook. Entries will stop in 12 hours.
We used our Facebook page and Twitter account to communicate during this down time - so if you aren't already on the AJ bandwagon, please Like/Follow to make sure you always have the most recent updates from us.
Let's get back to business. Good luck in these last few days of Season 9!
Click here to view the article