WCCILdata - REDU/WARNING - DataManager, recoveryTimeoutExpired, Recovery timeout expired, aborting recovery and restarting data manager
Enclosed you'll find the explanation for a log-message which can occur during startup in a redundant system when the recovery of the database failed. The log-message is written to the PVSS_II.log-file.
WCCILdata (0), 2014.09.24 10:31:14.121, REDU, WARNING, 54, Unexpected state, DataManager, recoveryTimeoutExpired, Recovery timeout expired, aborting recovery and restarting data manager
Log-message with symbolic names:
WCCILdata (0), <TIMESTAMP>, REDU, WARNING, 54, Unexpected state, DataManager, recoveryTimeoutExpired, Recovery timeout expired, aborting recovery and restarting data manager
The log-message is written when the allowed time is exceeded on the system which is starting up and therefore making the passive recovery. The maximum time for the recovery of the database is defined with the following config-entry in the config-redu-file at the [data]-section (value is defined in seconds):
passiveRecoveryTimeout = 1800
If the timeout is reached you have to look why this happened. It can be caused by a slow network, hard disc with an insufficient read/write performance or when a lot of data needs to be copied.
If you want to change the timeout you have to do it in a config.redu-file stored in your project.
When the recovery is started you will normally see the following block of log-messages, at the given example also the timeout-message was added:
WCCILdata (0), <TIMESTAMP>, REDU, INFO, 0, , Sending recovery request to other replica
WCCILdata (0), <TIMESTAMP>, REDU, INFO, 0, , Recovery request accepted, sending file list request
WCCILdata (0), <TIMESTAMP>, REDU, INFO, 0, , File transfer request sent
WCCILdata (0), <TIMESTAMP>, REDU, WARNING, 54, Unexpected state, DataManager, recoveryTimeoutExpired, Recovery timeout expired, aborting recovery and restarting data manager
In rare cases the recovery request in not answered correctly by the running project on the other server in the redundant system. Then you will see the following block of log-messages. The time between the messages is 2 minutes.
WCCILdata (0<TIMESTAMP>, REDU, INFO, 0, , Sending recovery request to other replica
WCCILdata (0), <TIMESTAMP>, REDU, WARNING, 54, Unexpected state, DataManager, recoveryTimeoutExpired, Recovery timeout expired, aborting recovery and restarting data manager
In that case changing the config-entry has no effect. This timeout of 2 minutes is hardcoded in the source code.
If this situation occurred you have to try the startup and recovery again, normally it works when starting the recovery again.
At the following FAQ-entry it is described how to check the hardware performance for the recovery:
portal.etm.at/index.php