Site outtage overnightModerator: Moderators
Forum rules
Please ensure that you have a meaningful location included in your profile. Please refer to the FAQ for details of what "meaningful" is.
Previous topic • Next topic
5 posts
• Page 1 of 1
Site outtage overnightFor those who didn't notice, there was a site outtage overnight.
It started around 10:30 last night. I noticed an issue within the hour, but couldn't connect to the server, so I power cycled it, and everything basically seemed to come back, but I wasn't comfy. My discomfort was well founded, and about 4am it went down again, with indications of a hdd failing. After chatting with corenetworks support, we were able to clone the dying hdd and replace it, with everything back up as normal just a little after 8 this morning. Please let me know if you see anything new and unusual. g.
Gary Stark Nikon, Canon, Bronica .... stuff The people who want English to be the official language of the United States are uncomfortable with their leaders being fluent in it - US Pres. Bartlet
Re: Site outtage overnightWe, the dedicated followers of the forum, rarely are aware of the sweat, tears and dedicated skill it takes to keep this forum functioning. Bravo Gary. Now get some well earned sleep and thank you from all of us.
Regards
Matt. K
Re: Site outtage overnightGood job Gary... I indeed notice that went down and I was receiving database connection type errors.
I was more than confident that you would have it in hand... these things happen of course. The trick is always to isolate the cause ASAP. cheers, Michael. Photography is not a crime, but perhaps my abuse of artistic license is?
Re: Site outtage overnight
And therein lay the real challenge: there was no indication of why the site went down - nothing obvious in the logs, and I couldn't actually access the site after it had died: ssh was out, webmin and phpadmin both out. Nothing. I needed to power cycle the box - which I can do from my control panel - and then see what was happening. Nothing too major after the first outage, but I wasn't really comfortable; it certainly was feeling like the hdd was cactus. Then, after restarting from the second outage, when trying to rsync the web sources, I had a couple of failures, and then I could see a SMART error on /dev/sda in the log. At this time it was also starting to fall over after maybe 5 minutes of up time, so by the time I'd logged the fault, with my descriptions of the symptoms, it had died several more times. The last time though confused me a little: webmin wasn't responding correctly, websites not at all, but I had fairly good access to the fielsystem while I remained logged in through ssh. My thoughts then moved towards the PSU as the possible problem, and I relayed that to Corenetworks. They listened to what I was saying, agreed that the SMART messages pointed to hdd failure, but checked the PSU as well, then tried to (and succeeded) clone the old drive onto a new one, and away we went. In terms of support, I'd say it was about three hours from fault lodgement to being back on line, with prompt and meaningful responses, from people who were helpful and knowledgeable. Yes, I know that this is what the standard should be, but the reality is that this is so rare that it becomes notable when it's actually achieved. Had the cloning failed, I would have needed a IP-KVM, and to reinstall a new OS onto the new HDD. probably another hour or two work for me in terms of managing that process. g.
Gary Stark Nikon, Canon, Bronica .... stuff The people who want English to be the official language of the United States are uncomfortable with their leaders being fluent in it - US Pres. Bartlet
Site outtage overnightWell done Gary. I did notice the outage and I also noticed that at one point after the fix was done, a post from Rooz was missing from a thread I started but it is now back.
Fuji X-Pro1 | X-E1 | X-T1 | XF14 | XF23 | XF27 | XF35 | XF56 | XF60 | XF10-24 | XF18-55 | XF55-200 | MCEX-11
http://gmarshall.zenfolio.com http://xtographer.weebly.com
Previous topic • Next topic
5 posts
• Page 1 of 1
|