Workflow causes scheduled job to fail regardless of conditions

BTownBrit · 31 July 2021 00:50

Version Info:

PHP – 7.3.13 *
IIS – 10.0.17763.1
SQL – MS SQL Server 2016 13.0.5103.6 (X64)
SuiteCRM – 7.10.30, Sugar Version 6.5.25 (Build 344)

I’ve been experiencing a problem where the Process Workflow job will fail. Suite log shows:
Fri Jul 30 16:39:56 2021 [6980][1 ][FATAL] Job 77812e77-5e52-8e64-09aa-61041deb8915 (Process Workflow Tasks) failed in CRON run
Fri Jul 30 16:39:56 2021 [6980][1 ][INFO] Resolving job 77812e77-5e52-8e64-09aa-61041deb8915 as failure: Unexpected failure, please check PHP logs and suitecrm.log

PHP log doesn’t show any errors.

I have a Workflow isolated right now which when activated causes the the above failure.
I’ve found that creating a copy of this workflow with all the same conditions/actions does not cause any failure.
I’ve also found that regardless of the conditions and actions within the original workflow (and even with no conditions/actions), just having this workflow activated causes the Process Workflow job to fail.

I’ve looked at Debug level logging and the only thing I see before the fail is the workflow query followed by all the leads matching the query being processed:
Sat Jul 31 01:34:09 2021 [6764][1 ][DEBUG] Hook called: Leads::process_record
Sat Jul 31 01:34:09 2021 [6764][1 ][DEBUG] process_full_list: Lead(113ddbdf-80b0-a661-a849-5f19bf049567): id = 113ddbdf-80b0-a661-a849-5f19bf049567

I’m interested to know if anyone has experienced this kind of problem - I’m trying to understand what’s going on as this install has a large number of workflows and I think this is affecting multiple workflows. Obviously the easy fix is to recreate any ‘bad’ workflows. but I want to try and understand the root cause if possible. Creating new versions run the risk of re-processing records. Or is there something else I should be looking at?

pgr · 31 July 2021 09:35

As the message says, you need to check your PHP logs, which are not the same as suitecrm.log

BTownBrit · 31 July 2021 10:11

I am checking both the Suite log and PHP log - there’s no errors in the PHP log at the time of the cron job failing.

pgr · 31 July 2021 11:04

have you checked your server lately for overgrown tables?

Run the first query in this post and examine results:

It’s possible that you have excessive let-over entries in process_record and the query runs too long.

Turning on “Log slow queries” in Admin / Settings would also give you this clue.

BTownBrit · 31 July 2021 19:28

Thanks. I can’t check indexes currently, but aow_processed is large. I’ve been reading through this thread and will see if we can improve the situation.

The Prune Database job is not active - I assume it would be good practice to get that activated.

Is there any way to check left over entries in process_record? Does this relate to the job_queue table?

TableName	RowCount	TotalSpaceKB	UsedSpaceKB
aow_processed	9,339,577	5,016,224	5,014,704
securitygroups_records	2,591,262	1,769,024	1,768,688
leads_cstm	868,806	812,816	812,552
leads	868,770	1,401,904	1,398,680
email_addr_bean_rel	480,772	170,696	169,808
email_addresses	447,125	173,512	172,632
leads_audit	195,709	63,344	62,864
aow_processed_aow_actions	161,703	156,848	156,656
job_queue	125,103	246,496	246,224
calls_leads	74,122	83,368	83,120

pgr · 1 August 2021 11:35

Sorry, forget about process_record, what I meant to write was aow_processed, which is exactly your problem.

You can also probably clear most of your securitygroups_records by deleting orphaned records (the ones that point at records which are no longer in the database, like deleted Leads, for example).