Does anyone know if the Prune Database job is trying to handle this or not?
What I mean is, are all of you people with this problem having errors in the Prune job, or for some of you it’s running, but not doing anything to these tables?
It would also be nice to know why the Prune job is failing, if it’s failing.
I had a short look at the pruning script, seemed to me like all it does is delete rows marked as “deleted”.
I also let my pruning script run over the weekend. It worked fine (did a few random checks and could confirm that “deleted”-marked records were deleted from the tables. => no effect on aow_processed tables.
It has now been closed and moved to Trello. But I wanted to add this after Matt explained the relevance of the aow_processed records. Namely: Workflows that are supposed to only run once rely on these aos_processed records to decide whether to fire or not. If you delete the records some Workflows may fire again, that are set up to only fire once per record.
One (imperfect) approach would be to change my query (from older posts here) to only delete records that are related to workflows that are set up to run repeatedly. That way, the history would be preserved for the WF that are currently set up to only fire once.
Or only only delete aow_processed records for WF that are setup to run via scheduler. That would get rid of the main portion of the bloat. In my case probably 99.99% of the records, without destroying any needed WF log.
I’m having same problems, there is approx 71.Mil records only in aow_processed folder, 20+GB in size. Workflows that I have created are running on modified records and only when somebody hits save button in CRM.
I was wondering how can I purge or trim data in this table and create scheduler automatically to do so and how safe is this against the old data that is already processed. Would I have double records or perhaps repeated entries for old records after doing that.
I have read that, but I’m looking for the way on how to create that scheduled task under Admin-Schuduler in CRM. What would be the steps to create custom scheduled task. Should I copy mysql commands in a text file somewhere in CRM and then create a scheduler for that? I never done that before with CRM.
Another problem is that I have created custom workflow that converts Accounts to Contacts with all of it’s records, and this workflow is indeed set to transfer Account to Contact but only once per record. So I would probably have to have a script that rules out all of the workflows that are triggered to run only once per record.
I don’t have a clear idea of what the “clearing up” steps should be, because of that warning Matt gives about old items and unpredictable results.
That’s a previous problem, if you don’t know how to delete safely, what exactly are you going to put into a Scheduler Job?
Another preliminary concern (and another thing I don’t fully understand) is what exactly are the inappropriate uses of “repeated runs”? It seems if this is incorrectly used, the huge database sizes are aggravated…
But if you have all that solved, and you know your configuration is ok, and know how to purge the database safely, then all you have to do is follow some tutorial to create a custom Scheduler Job.
Possibly the best thing to do, if you’re going to spend time developing for this, would be to change the code in SuiteCRM to increment a count on the database, instead of repeating the row, like Matt suggests…
Thanks pgr, for guide on how to create a custom scheduler task. Yes, counting only works for repeated runs, not for single one. So I guess only thing that you can do safely is to delete records for the clients that are deleted or for the no longer active or deleted workflow processes.
Ok, that sounds reasonable. When you’re done, please come back here and post some of the queries you developed for the deletions, and tell us how much data you were able to trim. I think this will be useful for more people. Thanks.
Here’s my updated pruning queries taking into consideration, that the log entries should never be deleted for workflows that are supposed to only run once.
USE crm;
CREATE TABLE aow_processed_new LIKE aow_processed;
# retain all aow_processed rows that belong to WF that run only once per record
INSERT aow_processed_new SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 0;
# prune aow_processed rows that belong to WF that run multiple times, based on date
INSERT aow_processed_new SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1 AND wp.date_entered >= '2017-10-15 00:00:00';
REPAIR TABLE aow_processed_new;
CREATE TABLE aow_processed_aow_actions_new LIKE aow_processed_aow_actions;
# retain all aow_processed_aow_actions rows that belong to WF that run only once per record
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 0) as wp ON wpa.aow_processed_id = wp.id;
# prune aow_processed_aow_actions rows that belong to WF that run multiple times, based on date
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1) as wp ON wpa.aow_processed_id = wp.id AND wpa.date_modified >= '2017-10-15 00:00:00';
REPAIR TABLE aow_processed_aow_actions_new;
ALTER TABLE aow_processed RENAME aow_processed_old;
ALTER TABLE aow_processed_new RENAME aow_processed;
ALTER TABLE aow_processed_aow_actions RENAME aow_processed_aow_actions_old;
ALTER TABLE aow_processed_aow_actions_new RENAME aow_processed_aow_actions;
DROP TABLE aow_processed_old;
DROP TABLE aow_processed_aow_actions_old;
This does not prune the aow_log for Workflows that have been deleted. But that is not necessary for my use case. Deleting old logs by date takes care of that for me…
John you never got around to doing what I asked, but since this issue keeps coming up, I encourage you to post your SQL up on the GitHub Issue, and I’ll try to get it reopened. Thanks!
… it seems it always compares the records date_modified with their date_entered values, regardless of running from “on save” or “on scheduler”. So I would say it still applies as a restriction.
But I am not sure what date_entered means when “on scheduler” :huh:
If you have some time to test or run it with a debugger, I’d love to get to the bottom of this.
Hello I’m running into the problem that both of my tables are bigger than 10GB, making 20GB together. I tried the sql-query to delete some lines but the execution always fails (as the tables might be to big). Do you have an advice on what to do now?
did you change the date filters from 2017-10-15 00:00:00 to something more recent, like maybe 2018-09-01 00:00:00?
Even if your tables are huge, the select statement should work fine I would think?
Also, at least for debugging, don’t run the whole script in one go. Instead run the individual sql queries one at a time. Then at least you’ll know which queries aren’t working.