Multiple issues - Audit not catching all changes, workflows not operating properly

Several issues being encountered and the fact that it’s working very inconsistently is what’s stumping me.

  • Workflow actions are not occurring
  • Change logs do not contain all data

First off, the overview:
The design relies heavily on automatic calculations so workflows flow into workflows multiple times. Some of this is due to lack of “OR” options on workflow triggers, and some is due to places where logic tests need to occur so multiple workflows happen, or multiple states flow into one other workflow.

There is also a scheduled workflow that is supposed to run against all records in a custom module once per day, and that is working just fine aside from the fact that it takes 17 cron runs to get through all the records.

Everything was working perfectly fine for a month, and then things started sometimes not working.

A workflow fires that updates “date checked” to now() when a checkbox is checked and unchecks the checkbox. There is also a recalculation workflow that should fire when the “date checked” changes to set an expiration date, but it is sometimes not firing when the first workflow makes the change. Two records changed within three minutes of each other. On one, the quick check workflow fired and changed the Date Checked value, and the recalc workflow did not fire. On the second, the recalc workflow did fire.

I set nearly all of the fields in the module to be Audited in order to try to diagnose the problem.
When workflows change multiple field values, only ONE field will be audited into the Change Log. It’s always the same one on a given record, but not the same one on all records.
When a user changes multiple fields, sometimes more than one field will make it into the change log. Sometimes only one will be recorded. Sometimes none will be recorded. Never all of them (for example, the Quick Check checkbox is set to be audited and this never audits).
There are some records that audits never work on at all. There are no rows in the audit table for those records whatsoever, despite changes being made for example yesterday and the oldest rows in the audit table being May.

So a combination of inconsistent workflow triggering and pseudo-inconsistent auditing is making it so that the data is getting corrupted and users do not trust the system at all now.

This is running on ubuntu 20.04, shared hosting system VM.
MariaDB 10.3
PHP 8.1 (I need to see about downgrading this to 8.0 for proper compatibility.)
SuiteCRM v 7.12.5 (And I need to see about upgrading this to .6)

No errors in the instance logs other than the failed workflow scheduler sixteen times every night at midnight (because it can’t complete everything).
No PHP/Apache errors associated with any of the issues.

If things were failing in a reliable manner, that would be one thing. The fact that they are failing differently each time leaves me with no good idea what the confounding factor is, and the audit/change log issue makes it much more difficult to diagnose.

Any insight on things to check or known issues would be appreciated.

Definitely downgrade PHP version!

Are you checking the php_errors.log? Any FATALs there?

I would suspect that your processes started getting slower and slower due to accumulation of data in the database (including aow_processed which tracks workflow “repeated runs”), and now they’re randomly breaking through lack of memory or excessive execution time (which you can tweak in php.ini).

Good news: 8.1 was just the CLI, so affected cron. 7.4 is used by the httpd. I’ve corrected that.
Bad news: The issue is occurring on save, not on scheduler, so 8.1 on scheduler shouldn’t be contributing.

No fatal errors related to the issue. Two in the middle of July and two earlier today related to the upgrade of the instance to 7.12.6.

Ton of PHP Notice-level items as well as some related to workflows.

Timing out doesn’t seem to be an issue, as the instance is rather small and there are no logs indicating timeouts or RAM-outs.

aow_processed is sitting at 21.5k rows which is not too full, and the cases where the workflow is not triggering is not taking a long time to say it’s saved.

If I could trigger the problem consistently, it would make things a lot easier as I could set logging to a deeper level.

The PHP notices are suspicious to me. Perhaps you can get some details from this oddity (arrays are heavily truncated due to size):
PHP Notice: Undefined property: bld_Builds::$date_entered in /var/www/html/modules/AOW_WorkFlow/AOW_WorkFlow.php on line 646
[08-Aug-2022 04:20:25 UTC] PHP Stack trace:
[08-Aug-2022 04:20:25 UTC] PHP 1. {main}() /var/www/html/index.php:0
[08-Aug-2022 04:20:25 UTC] PHP 2. SugarApplication->execute() /var/www/html/index.php:52
[08-Aug-2022 04:20:25 UTC] PHP 3. SugarController->execute() /var/www/html/include/MVC/SugarApplication.php:101
[08-Aug-2022 04:20:25 UTC] PHP 4. SugarController->process() /var/www/html/include/MVC/Controller/SugarController.php:361
[08-Aug-2022 04:20:25 UTC] PHP 5. SugarController->handle_action() /var/www/html/include/MVC/Controller/SugarController.php:465
[08-Aug-2022 04:20:25 UTC] PHP 6. SugarController->do_action() /var/www/html/include/MVC/Controller/SugarController.php:491
[08-Aug-2022 04:20:25 UTC] PHP 7. SugarController->action_save() /var/www/html/include/MVC/Controller/SugarController.php:522
[08-Aug-2022 04:20:25 UTC] PHP 8. SugarBean->save($check_notify = FALSE) /var/www/html/include/MVC/Controller/SugarController.php:681
[08-Aug-2022 04:20:25 UTC] PHP 9. SugarBean->call_custom_logic($event = ‘after_save’, $arguments = ‘’) /var/www/html/data/SugarBean.php:2426
[08-Aug-2022 04:20:25 UTC] PHP 10. LogicHook->call_custom_logic($module_dir = ‘bld_Builds’, $event = ‘after_save’, $arguments = ‘’) /var/www/html/da>
[08-Aug-2022 04:20:25 UTC] PHP 11. LogicHook->process_hooks($hook_array = [‘after_save’ => [0 => […], 1 => […], 2 => […]], ‘after_delete’ => [>
[08-Aug-2022 04:20:25 UTC] PHP 12. AOW_WorkFlow->run_bean_flows($bean = class bld_Builds { public $new_schema = TRUE; public $module_dir = 'bld_Buil>
[08-Aug-2022 04:20:25 UTC] PHP 13. AOW_WorkFlow->check_valid_bean($bean = class bld_Builds { public $new_schema = TRUE; public $module_dir = 'bld_Bu>

That’s part of the logic that determines whether the workflow runs based on New records or Modified records tagging. Based on the code further down, not having anything defined for date_entered will definitely cause the workflow to not trigger.

But how is that value being empty? Based on the context, it looks like it might be missing numerous parts of the bean.

I have those notices in my system too, I don’t think that is the cause of your problem.

I have to say, in terms of architecture/approach, your set up doesn’t sound too good to me. I would avoid relying so heavily on Workflows, especially for “all records” operations (which are waaaay more efficient to do in direct SQL commands or other custom PHP entry-points, and for anything involving too much reentry. The restrictions that led you to do that should be easily avoidable in logic hooks where you can be much more precise with the conditions you use.

That said, I understand your frustration; if this worked well before, and now fails only sometimes, that’s really a bummer.

But I can’t figure it out or suggest anything with the information you have given so far. Maybe if you focus on log messages immediately before the Workflows fail? Anything there?