Pruning / Archiving Database Tables Clean Up

Been running suite in a 5-person business for about 4 years now. One thing that is missing is any sort of maintenance utility to archive old data! Even if you deleted records from the interface it just marks a field deleted=1. So the DB table grow indefinitely! Businesses don’t need to look up cases usually from more than 5 or 7 years ago (beyond a period the could be audited, etc). So what do people do, especially larger businesses? Has no one written an archiving utility?

In the meantime I want to at least hit the low hanging fruit and clean up the DB a bit. Here’s my thoughts doing a sort on the tables by size:

securitygroups_records - We just started using security groups about 6 months ago and the table is already 171K records! We have like 100 cases a month and each case has about 20 documents and 20 tasks, a few notes, emails, etc. It’s the largest table in size, 132MB, so if it only takes 6 months to get that big I cringe what it will be like in 2 years and the performance of the DB queries on this table!! Storage is cheap, we have a pretty powerful dedicated server so I guess I can let it go.

aow_processed_aow_actions - All of our workflows are triggered off saving a case and they all create tasks. None are repeat runs. A case remains open for about a month. So it seems to me it’s safe to say no one is going to be saving any cases older than 6 months, let alone a year. So any reason I can’t delete any records in this table over a year old? (This would be part of an archive utility if I ever develop one.)

job_queue - This just looks like a log of the cron jobs Suite runs. They are all status = Done. What’s the use of keeping them other than to examine what’s been executed (or failed) for troubleshooting? If everything is running good, then why not turn this off or delete all the status - Done records every week or so?

Hi

do you have the Prune job turned on? That should take care of the “deleted=1” records.

This Issue on Github also has a helpful script for aow_processed:

https://github.com/salesagility/SuiteCRM/issues/3328

About job_queue - that is the safest and easiest to clean up. I once heard a recommendation to clear everything that has status=Done and is older than a certain amount of days (like a week or a month).

Prune job is turned on. Users don’t delete a lot of records though so that’s not going to be a significant reduction in database size.

I noticed in the scheduler there’s a ‘clean jobs queue’ task that is active and scheduled to run once a day. What does it do? It obviously doesn’t delete the status=done records in the jobs_queue table.

The Github script doesn’t help me because it says:

retain logs in both tables that belong to workflows that are set up to run only once per record.

All my workflows are once per record, but they are also triggered “On Save” only so as I said before I think it’s safe considering the case records all my workflows are associated with are in use for about a month and then no modifications are ever made to them again. So maybe this Github script WILL work for me?

The Jobs clean up job does this:

https://github.com/salesagility/SuiteCRM/blob/master/modules/Schedulers/_AddJobsHere.php#L538-L556

I wonder if that is working well, I always thought the only way to clean up those jobs was manual intervention. Maybe you can add some logging to that function and see if it’s working well?

About the Workflows - yes, you basically have to make a decision on how important it is for you that those workflows absolutely don’t ever run for the same record. In order to enforce that, ALL those database rows need to stay where they are. But if you decide you can get away with ignoring old cases, or any other condition that is acceptable to you, you can start removing some rows.