Workflow causes scheduled job to fail regardless of conditions

Version Info:

  • PHP – 7.3.13 *
  • IIS – 10.0.17763.1
  • SQL – MS SQL Server 2016 13.0.5103.6 (X64)
  • SuiteCRM – 7.10.30, Sugar Version 6.5.25 (Build 344)

I’ve been experiencing a problem where the Process Workflow job will fail. Suite log shows:
Fri Jul 30 16:39:56 2021 [6980][1 ][FATAL] Job 77812e77-5e52-8e64-09aa-61041deb8915 (Process Workflow Tasks) failed in CRON run
Fri Jul 30 16:39:56 2021 [6980][1 ][INFO] Resolving job 77812e77-5e52-8e64-09aa-61041deb8915 as failure: Unexpected failure, please check PHP logs and suitecrm.log

PHP log doesn’t show any errors.

I have a Workflow isolated right now which when activated causes the the above failure.
I’ve found that creating a copy of this workflow with all the same conditions/actions does not cause any failure.
I’ve also found that regardless of the conditions and actions within the original workflow (and even with no conditions/actions), just having this workflow activated causes the Process Workflow job to fail.

I’ve looked at Debug level logging and the only thing I see before the fail is the workflow query followed by all the leads matching the query being processed:
Sat Jul 31 01:34:09 2021 [6764][1 ][DEBUG] Hook called: Leads::process_record
Sat Jul 31 01:34:09 2021 [6764][1 ][DEBUG] process_full_list: Lead(113ddbdf-80b0-a661-a849-5f19bf049567): id = 113ddbdf-80b0-a661-a849-5f19bf049567

I’m interested to know if anyone has experienced this kind of problem - I’m trying to understand what’s going on as this install has a large number of workflows and I think this is affecting multiple workflows. Obviously the easy fix is to recreate any ‘bad’ workflows. but I want to try and understand the root cause if possible. Creating new versions run the risk of re-processing records. Or is there something else I should be looking at?

As the message says, you need to check your PHP logs, which are not the same as suitecrm.log

I am checking both the Suite log and PHP log - there’s no errors in the PHP log at the time of the cron job failing.

have you checked your server lately for overgrown tables?

Run the first query in this post and examine results:

It’s possible that you have excessive let-over entries in process_record and the query runs too long.

Turning on “Log slow queries” in Admin / Settings would also give you this clue.

Thanks. I can’t check indexes currently, but aow_processed is large. I’ve been reading through this thread and will see if we can improve the situation.

The Prune Database job is not active - I assume it would be good practice to get that activated.

Is there any way to check left over entries in process_record? Does this relate to the job_queue table?

TableName RowCount TotalSpaceKB UsedSpaceKB
aow_processed 9,339,577 5,016,224 5,014,704
securitygroups_records 2,591,262 1,769,024 1,768,688
leads_cstm 868,806 812,816 812,552
leads 868,770 1,401,904 1,398,680
email_addr_bean_rel 480,772 170,696 169,808
email_addresses 447,125 173,512 172,632
leads_audit 195,709 63,344 62,864
aow_processed_aow_actions 161,703 156,848 156,656
job_queue 125,103 246,496 246,224
calls_leads 74,122 83,368 83,120

Sorry, forget about process_record, what I meant to write was aow_processed, which is exactly your problem.

You can also probably clear most of your securitygroups_records by deleting orphaned records (the ones that point at records which are no longer in the database, like deleted Leads, for example).