How can we help you today? How can we help you today?
mgjernes

Activity overview

Latest activity by mgjernes

After the 30GB full restore failure above with 6.4.0.1014, I reverted this test server to 6.4.0.56 and then re-ran the same restore command. It completed successfully in 00:46:37. / comments
After the 30GB full restore failure above with 6.4.0.1014, I reverted this test server to 6.4.0.56 and then re-ran the same restore command. It completed successfully in 00:46:37.
0 votes
OK, I managed to force error output from these "hung" restores - it matters in which order one kills the backup processes from SQL Activity Monitor (vs just restarting the SQL Backup service). There is no data in the _bugreport log file for these events, probably because I was just killing the process - it was not "crashing" per se. I found that if I use SQL Activity Monitor to kill the SQL process with the restore command, then the "lead" SQL process in the multi-thread restore chain (the one that shows the restore commands per virtual disk device in the details), I will get logged errors in the SQL Backup Output. I re-attempted my test server (6.4.0.1014) restore of the 30GB database using a fresh, production backup (6.4.0.56, 3 threads, level 4 compression, 256-bit, single file). I killed the restore after 6.5 hours. This time I did not remember to disable a regular log restore job before I initiated the test restore. The first stage of the log restore (different database) hung for the entire time the full restore was "working" and did not resume once I finally killed the full restore (these log restores typically complete in 30 to 50 seconds). As soon as I killed the hung log restore, the job that initiated it went on to its next DB and resumed restoring logs correctly. Below is the output from the error logs of both killed jobs. While these restores were hung up, the SQL activity monitor showed an additional SQBCoreService process per thread of the restore. The properties for these threads (details) were blank, but the CPU use column kept continuously cycling to 5 seconds for each process, while the top level processes - for the initial restore command executed in SSMS and the next level down that detailed the restore per virtual disk - had continuously increasing CPU counts. The top level process for the restore showed as Application="SQL Backup", Status="Runnable", Command="Execute", Wait Type= "MSQL_XP" The virtual disk detailed process listed as "SQBCoreService.exe", "Suspended", "Restore Database", wait type = "BACKUPTHREAD" The "cycling" processes showed as "SQBCoreService.exe", "Suspended", "Restore Database", wait type = "BACKUPIO". In the error output, each restore thread has an API timeout 1125 seconds after the start of the restore process. The log restore that was killed shows the same API timeout errors but only 650 seconds after the start entry for the restore process. The VDI timeout setting on this server is 900. Killed, Full restore log output: -SQL "RESTORE DATABASE [TestDB] FROM DISK = '\\[BackupShare]\ServerName_TestDB_20100810_021022_FULL.sqb' WITH MAILTO = '[alert email]', PASSWORD = 'XXXXXXXXXX', NORECOVERY, DISCONNECT_EXISTING, REPLACE " ERRORS AND WARNINGS 10/08/2010 07:26:03: Restoring TestDB (database) from: 10/08/2010 07:26:03: \\[BackupShare]\ServerName_TestDB_20100810_021022_FULL.sqb 10/08/2010 07:26:03: ALTER DATABASE [TestDB] SET ONLINE RESTORE DATABASE [TestDB] FROM VIRTUAL_DEVICE = 'SQLBACKUP_CBAA31FF-C010-441D-A629-B6129EE428BF', VIRTUAL_DEVICE = 'SQLBACKUP_CBAA31FF-C010-441D-A629-B6129EE428BF01', VIRTUAL_DEVICE = 'SQLBACKUP_CBAA31FF-C010-441D-A629-B6129EE428BF02' WITH BUFFERCOUNT = 18, BLOCKSIZE = 65536, MAXTRANSFERSIZE = 1048576 , NORECOVERY, REPLACE 10/08/2010 07:44:48: Thread 0 error: Process terminated unexpectedly. Error code: -2139684861 (The api was waiting and the timeout interval had elapsed.) 10/08/2010 07:44:48: 10/08/2010 07:44:48: Thread 1 error: Process terminated unexpectedly. Error code: -2139684861 (The api was waiting and the timeout interval had elapsed.) 10/08/2010 07:44:48: 10/08/2010 07:44:48: Thread 2 error: Process terminated unexpectedly. Error code: -2139684861 (The api was waiting and the timeout interval had elapsed.) 10/08/2010 07:44:48: 10/08/2010 13:56:56: SQL error 3013: SQL error 3013: RESTORE DATABASE is terminating abnormally. 10/08/2010 13:56:56: SQL error 3203: SQL error 3203: Read on "SQLBACKUP_CBAA31FF-C010-441D-A629-B6129EE428BF01" failed: 995(The I/O operation has been aborted because of either a thread exit or an application request.) Killed Log restore output: -SQL "RESTORE LOG [TestDB2] FROM DISK = '\\[BackupShare]\ServerName_TestDB2_*_LOG.sqb' WITH ERASEFILES = 14, FILEOPTIONS = 1, MAILTO_ONERRORONLY = '[Alert Email]', PASSWORD = 'XXXXXXXXXX', STANDBY = 'C:\[LocalWorkDir]\UNDO_TestDB2.dat', DISCONNECT_EXISTING, MOVETO = '\\[BackupShare]\LogShipped' " ERRORS AND WARNINGS 10/08/2010 07:30:01: Restoring TestDB2 (transaction logs) from: 10/08/2010 07:30:01: \\VirtualCenter\SQLBackup$\ServerName\ServerName_TestDB2_20100810_072500_LOG.sqb 10/08/2010 07:30:01: ALTER DATABASE [TestDB2] SET ONLINE RESTORE LOG [TestDB2] FROM VIRTUAL_DEVICE = 'SQLBACKUP_906A5990-1291-4D39-9327-5EBACB5BFF7A', VIRTUAL_DEVICE = 'SQLBACKUP_906A5990-1291-4D39-9327-5EBACB5BFF7A01', VIRTUAL_DEVICE = 'SQLBACKUP_906A5990-1291-4D39-9327-5EBACB5BFF7A02' WITH BUFFERCOUNT = 18, BLOCKSIZE = 65536, MAXTRANSFERSIZE = 1048576 , STANDBY = N'C:[LocalWorkDir]\UNDO_TestDB2.dat' 10/08/2010 07:40:51: Thread 0 error: Process terminated unexpectedly. Error code: -2139684861 (The api was waiting and the timeout interval had elapsed.) 10/08/2010 07:40:51: 10/08/2010 07:40:51: Thread 1 error: Process terminated unexpectedly. Error code: -2139684861 (The api was waiting and the timeout interval had elapsed.) 10/08/2010 07:40:51: 10/08/2010 07:40:51: Thread 2 error: Process terminated unexpectedly. Error code: -2139684861 (The api was waiting and the timeout interval had elapsed.) 10/08/2010 07:40:51: 10/08/2010 14:00:17: SQL error 3013: SQL error 3013: RESTORE LOG is terminating abnormally. / comments
OK, I managed to force error output from these "hung" restores - it matters in which order one kills the backup processes from SQL Activity Monitor (vs just restarting the SQL Backup service). The...
0 votes
Well, it appears that my testing was not sufficiently thorough. Release .1014 did resolve the slow restore speed on the smaller databases I used for the testing. However, on a larger database, it hangs up just as did .1001 and .1013. The database in question is about 30GB for the data file. Release 6.4.0.56 can restore this database on my test server in about 45 minutes. In the "In Progress" view of the UI, it reaches 27.3GB restored after about 18 minutes and stays there until the restore completes. With release .1013, the progress halted at 9.1GB restored and stayed there until I killed the job after several hours. With .1014, the reported progress reaches 27.3GB in about the same time as release .56, but then it never finishes either - at least not within the limits of my tolerance [image] I saw this same behaviour with .1014 on this same DB when I tried to restore a very large log backup (after a rebuild of all indexes). The log backup size was 61GB compressed to 9GB (full DB backup was ~90GB). Note to self - enable frequent log backups during reindex next time! The log restore attempt with 6.4.0.1014 ran for over 24 hours before I canned it. I did not attempt that log restore with .56 / comments
Well, it appears that my testing was not sufficiently thorough. Release .1014 did resolve the slow restore speed on the smaller databases I used for the testing. However, on a larger database, it...
0 votes