Saturday, March 5, 2016

SCCM 2007 Backup Failing - Hangs Indefinitely At SCOM Maintenace Mode

For the last 2 days, I've been on the phone with MS engineers and they haven't been able to figure out the issue to unstuck the MaintenancemodeSCOM.  We don't use SCOM in our environment and on working servers, it goes right by this error in a matter of seconds.   This didn't continue after the 1 hour timeout but shouldn't have to wait that long.  We are working with a partner for a project to upgrade our server OS and they were able to escalate this and on the 2nd day get the best of the best SCCM engineers, they tried their best but still weren't able to get any further than the first day going in with deeper dives and traces.









Still continuing working on day 2, I tried a different approach that we didn't try earlier with MS.  We already tried putting the services into a stopped state or stopping it when it gets stuck but it doesn't work but had before but what I always noticed is that after VSS was stopped then SMS_SITE_BACKUP was able to continue to stop the services.  At that point the backup would be logged as failed.  The logs however doesn't indicate anything in an unstable state or errors so that appears to be fine.



So what was tried differently was putting those 2 services in a paused state first and stopped VSS.  Let the backup or the writer start the VSS service and tehn start the backup.  This worked correctly 4 out of 5 times with the logs showing both writers are in a stable state on a successful completion with the correct event ID of 6833.  I think the 4th time VSS was already running so it should be stopped before a backup is enabled.  

Writers are Stable



It was a repeatable process and hopefully it will allow us to progress with upgrading these servers off 2003 R2 to 2008 R2 which I hope have more stable back up.  Not quite sure if the smsbkup.exe or the VSS is the issue but I believe somewhere with the 2 services. 

On Monday, we'll have MS validate this is a workable solution that can be used to get working backups.

Event ID 6833 indicates succesful backup


Update 3/7/2016: The Microsoft engineer validated that the method used to produce the backups is good in the sense the manual method is essentially the same as sms backup performing it and going by the official documentation that as long as the Event ID is produced the backup is successful.  To really confirm however the backups are valid, is to do restore it on the new server.

No comments: