Very Large aow_processed and aow_processed_aow_actions tables.

Does anyone know if the Prune Database job is trying to handle this or not?

What I mean is, are all of you people with this problem having errors in the Prune job, or for some of you it’s running, but not doing anything to these tables?

It would also be nice to know why the Prune job is failing, if it’s failing.

I had a short look at the pruning script, seemed to me like all it does is delete rows marked as “deleted”.

I also let my pruning script run over the weekend. It worked fine (did a few random checks and could confirm that “deleted”-marked records were deleted from the tables. => no effect on aow_processed tables.

Just for the sake of completeness: I created an issue (suggestion) for this on github.
https://github.com/salesagility/SuiteCRM/issues/3328

It has now been closed and moved to Trello. But I wanted to add this after Matt explained the relevance of the aow_processed records. Namely: Workflows that are supposed to only run once rely on these aos_processed records to decide whether to fire or not. If you delete the records some Workflows may fire again, that are set up to only fire once per record.

One (imperfect) approach would be to change my query (from older posts here) to only delete records that are related to workflows that are set up to run repeatedly. That way, the history would be preserved for the WF that are currently set up to only fire once.

Or only only delete aow_processed records for WF that are setup to run via scheduler. That would get rid of the main portion of the bloat. In my case probably 99.99% of the records, without destroying any needed WF log.

1 Like

Hello,

I’m having same problems, there is approx 71.Mil records only in aow_processed folder, 20+GB in size. Workflows that I have created are running on modified records and only when somebody hits save button in CRM.

I was wondering how can I purge or trim data in this table and create scheduler automatically to do so and how safe is this against the old data that is already processed. Would I have double records or perhaps repeated entries for old records after doing that.

I’m running 7.8.2 with MySQL.

Best Regards,

Slaven.

You can check this issue where Matt Lorimer comments. He developed this, if I’m not mistaken.

Thank you pgr,

I have read that, but I’m looking for the way on how to create that scheduled task under Admin-Schuduler in CRM. What would be the steps to create custom scheduled task. Should I copy mysql commands in a text file somewhere in CRM and then create a scheduler for that? I never done that before with CRM.

Another problem is that I have created custom workflow that converts Accounts to Contacts with all of it’s records, and this workflow is indeed set to transfer Account to Contact but only once per record. So I would probably have to have a script that rules out all of the workflows that are triggered to run only once per record.

Best Regards,

Slaven.

I don’t have a clear idea of what the “clearing up” steps should be, because of that warning Matt gives about old items and unpredictable results.

That’s a previous problem, if you don’t know how to delete safely, what exactly are you going to put into a Scheduler Job?

Another preliminary concern (and another thing I don’t fully understand) is what exactly are the inappropriate uses of “repeated runs”? It seems if this is incorrectly used, the huge database sizes are aggravated…

But if you have all that solved, and you know your configuration is ok, and know how to purge the database safely, then all you have to do is follow some tutorial to create a custom Scheduler Job.

Possibly the best thing to do, if you’re going to spend time developing for this, would be to change the code in SuiteCRM to increment a count on the database, instead of repeating the row, like Matt suggests…

Thanks pgr, for guide on how to create a custom scheduler task. Yes, counting only works for repeated runs, not for single one. So I guess only thing that you can do safely is to delete records for the clients that are deleted or for the no longer active or deleted workflow processes.

Best Regards,

Slaven.

Ok, that sounds reasonable. When you’re done, please come back here and post some of the queries you developed for the deletions, and tell us how much data you were able to trim. I think this will be useful for more people. Thanks.

Hey all.

Here’s my updated pruning queries taking into consideration, that the log entries should never be deleted for workflows that are supposed to only run once.

 USE crm;
CREATE TABLE aow_processed_new LIKE aow_processed;
# retain all aow_processed rows that belong to WF that run only once per record
INSERT aow_processed_new SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 0;
# prune aow_processed rows that belong to WF that run multiple times, based on date
INSERT aow_processed_new SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1 AND wp.date_entered >= '2017-10-15 00:00:00';
REPAIR TABLE aow_processed_new; 


CREATE TABLE aow_processed_aow_actions_new LIKE aow_processed_aow_actions;
# retain all aow_processed_aow_actions rows that belong to WF that run only once per record
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 0) as wp ON wpa.aow_processed_id = wp.id;
# prune aow_processed_aow_actions rows that belong to WF that run multiple times, based on date
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1) as wp ON wpa.aow_processed_id = wp.id AND wpa.date_modified >= '2017-10-15 00:00:00';
REPAIR TABLE aow_processed_aow_actions_new;

ALTER TABLE aow_processed RENAME aow_processed_old;
ALTER TABLE aow_processed_new RENAME aow_processed;
ALTER TABLE aow_processed_aow_actions RENAME aow_processed_aow_actions_old;
ALTER TABLE aow_processed_aow_actions_new RENAME aow_processed_aow_actions;

DROP TABLE aow_processed_old;
DROP TABLE aow_processed_aow_actions_old;

This does not prune the aow_log for Workflows that have been deleted. But that is not necessary for my use case. Deleting old logs by date takes care of that for me…

3 Likes

Hi John,

thanks for these scripts!

Maybe it would be a good idea to post them also on that github issue:

https://github.com/salesagility/SuiteCRM/issues/3328

I have some hope that they might be integrated in the “Prune database” Scheduler job…

EDIT: maybe that will help get the Issue reopened and labeled as a bug :slight_smile:

John you never got around to doing what I asked, but since this issue keeps coming up, I encourage you to post your SQL up on the GitHub Issue, and I’ll try to get it reopened. Thanks!

done

1 Like

Thanks a lot, let’s hope that can be added soon!

BTW, I recently wrote some Documentation for Workflows. Specifically, the beginning of this page

https://docs.suitecrm.com/user/advanced-modules/workflow/

where it explains the very important fields “Run”, “Run on” and “Repeated Runs”, which didn’t have any proper explanations yet.

If you can take a look tell me if it makes sense to you, and if you feel there is something that could be added for clarity or completion. Thanks!

Hey pgr. Looks like real progress! thanks.

Typo at Run “On Scheduler”:
“resticted by the Run on property”

Question:
In the “Run On” section you write:

How exactly does this behave when the workflow runs only “on scheduler”? I had always assumed that these two options only apply to “on save”.

Fixed the typo, thanks.

From what I see in the code (just a quick look):

https://github.com/salesagility/SuiteCRM/blob/hotfix/modules/AOW_WorkFlow/AOW_WorkFlow.php (search for “modified”)

… it seems it always compares the records date_modified with their date_entered values, regardless of running from “on save” or “on scheduler”. So I would say it still applies as a restriction.

But I am not sure what date_entered means when “on scheduler” :huh:

If you have some time to test or run it with a debugger, I’d love to get to the bottom of this.

Hello I’m running into the problem that both of my tables are bigger than 10GB, making 20GB together. I tried the sql-query to delete some lines but the execution always fails (as the tables might be to big). Do you have an advice on what to do now?

Best regards

Simon

Maybe try making the Query more restrictive, doing it by chunks (year by year, or even month by month).

did you change the date filters from 2017-10-15 00:00:00 to something more recent, like maybe 2018-09-01 00:00:00?

Even if your tables are huge, the select statement should work fine I would think?

Also, at least for debugging, don’t run the whole script in one go. Instead run the individual sql queries one at a time. Then at least you’ll know which queries aren’t working.

I’m going to try that. Thank your for the advice!