Webspace or database backup tasks get stuck and fail with timeout in 24 hours
Modified on: Fri, 17 Nov 2023 12:08 PM2020-01-22
Symptoms
Backup tasks are stuck in task manager for 24 hours, not showing any progress, and fail with timeout as a result:
36596021 Backup 'MySQL Database "db1000010_content_hub_he"' into group 'more updates' Mar-23-2015 13:27:07 sc02databases.db010 1075196 Enabled Running
36596020 Backup 'Webspace "www.NG1075196-2445.ccccloud.com"' into group 'more updates' Mar-23-2015 13:27:07 sc03apache.ds21 1075196 Enabled Running
Tasks get rescheduled with the output:
Request has been timed out, details:
system exception, ID 'IDL:omg.org/CORBA/TIMEOUT:1.0'
TAO exception, minor code = 3e (timeout during recv; low 7 bits of errno: 62 Timer expired), completed = MAYBE
Task error upon failure:
Message:'Destination host 'winbackup.hosting.local' (#12), IP '10.10.10.14' : Operation execution is aborted by timeout. Please increase timeout for this operation and try again.
Backups are done from Linux hosting nodes to a Windows backup node.
The following command, run from the Management Node for the Windows backup node host_id, hangs indefinitely:
# /usr/local/pem/bin/pleskd_ctl ping 12
Cause
HCL operations get stuck when performing several simultaneous backup operations from Linux node(s) to a single Windows node. The behavior is recognized as a product issue with internal id POA-92544.
Resolution
There are 2 possible workarounds:
-
regularly (once a day) restart "pem" service on the slave backup node:
net stop pem && net start pem
- configure a Linux backup node for backing up Linux webspaces and DBs
Note: the trigger for this issue is a high number of simultaneous backup tasks. So when they get failed in large amount, after pem service restart on the backup node, it is recommended to run the failed tasks one by one in order to avoid the same hang-up again.