Backup failure on community.liferay.com

What happened?

Today at 12:13pm WeDeploy, our hosting solution, started a maintenance process. As a normal procedure some services were restarted. 

After the restart, all services from community.liferay.com were up and running, but we noticed that some blog and forums posts were missing. We checked our list of backups and we noticed that the last backup was on July 4th.

We continued to investigate and discovered that the backup service, which was supposed to run every 3 hours, had stopped on July 4th at 7:50am. The reason for that was because there was not enough memory on the backup service and since then no backups were performed.

This was an isolated problem on the backup service inside of the community site project, all other WeDeploy clients were not affected.

 

What are we doing to avoid that in the future?

From the Community side, we're going to add another layer of backup, increase the allocated memory/cpu for that service, and proactively monitor its health. We will create another environment with scheduled recoveries in order to help us identify those situations faster. We will also improve the backup script to notify us when a backup doesn't finish.

From the WeDeploy side, we're elaborating a built-in Backup/Restore functionality specifically designed for Liferay services. It will include scheduled backups of the database and doclib, custom options for backup frequency and retention, and controls for manual backups and recoveries.

Besides that, we're implementing a feature called Alerts, which will notify users when they are close to reach their CPU and Memory limits, among other things.

 

Conclusion

We appreciate your feedback and will continue to invest our best resources in making our services better each day. Thank you again for your patience.