Webspace or database backup tasks get stuck and fail with timeout in 24 hours

Original Publishing Date:
2020-01-22

Symptoms

Backup tasks are stuck in task manager for 24 hours, not showing any progress, and fail with timeout as a result:

   36596021        Backup 'MySQL Database "db1000010_content_hub_he"' into group 'more updates'    Mar-23-2015 13:27:07    sc02databases.db010     1075196         Enabled Running
    36596020        Backup 'Webspace "www.NG1075196-2445.ccccloud.com"' into group 'more updates'   Mar-23-2015 13:27:07    sc03apache.ds21       1075196         Enabled Running

Tasks get rescheduled with the output:

Request has been timed out, details:
 system exception, ID 'IDL:omg.org/CORBA/TIMEOUT:1.0'
 TAO exception, minor code = 3e (timeout during recv; low 7 bits of errno: 62 Timer expired), completed = MAYBE

Task error upon failure:

Message:'Destination host 'winbackup.hosting.local' (#12), IP '10.10.10.14' : Operation execution is aborted by timeout. Please increase timeout for this operation and try again.

Backups are done from Linux hosting nodes to a Windows backup node.

The following command, run from the Management Node for the Windows backup node host_id, hangs indefinitely:

# /usr/local/pem/bin/pleskd_ctl ping 12

Cause

HCL operations get stuck when performing several simultaneous backup operations from Linux node(s) to a single Windows node. The behavior is recognized as a product issue with internal id POA-92544.

Resolution

There are 2 possible workarounds:

regularly (once a day) restart "pem" service on the slave backup node:
```
net stop pem && net start pem
```
configure a Linux backup node for backing up Linux webspaces and DBs

Note: the trigger for this issue is a high number of simultaneous backup tasks. So when they get failed in large amount, after pem service restart on the backup node, it is recommended to run the failed tasks one by one in order to avoid the same hang-up again.