Inbound Email Job Queue Getting Stuck

Hi everyone,
I’m not sure if anyone else has had an issue where the scheduler job for checking inbound email is getting stuck.

We currently have 5 different email addresses and inboxes which SuiteCRM checks the inbound email for and creates cases for (All inboxes are gmail). We set the job to run every 10 minutes. What seems to be happening is that when there’s a large volume of mail that needs to be imported into cases, the job doesn’t pick up all the mail and ends up being stuck as “Running, Queued”. The only way I’ve found to “fix” the issue is to go into the database and mark it as done. This really isn’t ideal and the departments using our case module are reasonably getting upset since we have no way to tell when it gets stuck unless someone reports it. We’ve wanted to move another 2-3 emails addresses in the future so that there would be 8 email accounts being read and making cases. Are we pulling too many emails or is it that we aren’t allowing it enough time to read all the inboxes? We are running 11.7 but this seems to have been an issue since we started using SuiteCRM roughly around 7.8

Have you checked your logs at the time when it hangs?

It might be as simple as adding PHP resources in php.ini, things like:

  • memory_limit
  • max_execution_time
  • max_post_size

Remember to restart your web server afterwards.

It seems to hang around 1 minute or so. The logs just say that there was an error. Even the job logs but there’s no explanation why.
My cli/php.ini I have:
memory_limit = 300M
max_execution_time = 0 (for CLI)
Should I be updating the apache2 php.ini?

Sometimes when it hangs there’s also no error.
To give a bit of perspective on the number of emails that are supposed to get pulled in. With all the emails combined, there’s an average of 150-200 emails per 10-20 minutes that need to become cases.
I noticed that with the upgrade, emails are no longer set to read which makes me wonder if that is adding to the load of emails it tries to process (we don’t delete after importing them).

Did you check the web server log? It is defined in your php.ini, and usually is called php_errors.log

If it is a resources FATAL, it will surely be logged there (not in suitecrm.log)

I know our logs are working but I don’t see any from the mail, which is what I find strange.
Are there any server recommendations for installs with a large amount of data like ours?

If the problem was a PHP fatal from lack of resources, it would be in that log; if it isn’t, I guess we should look for other possible causes.

Please run the first query in this post:

https://pgorod.github.io/Database-tables-size/

You might have some overgrown tables in the database, and queries could be running too long…

You can also turn on “log slow queries” in Admin / System settings and see if something appears in the suitecrm.log then.

Hello, These are the results:
(tablename/rows/DATA/idx/total_size/idxfrac)


folders_rel			6.623M	1.596G	2.285G	3.881G	1.432
emails_text			0.109M	1.329G	0.012G	1.341G	0.009
leads_audit			2.061M	0.461G	0.304G	0.765G	0.659
campaign_log			0.734M	0.224G	0.359G	0.583G	1.605
cases      			0.057M	0.390G	0.026G	0.416G	0.066
emails_beans			0.662M	0.202G	0.210G	0.412G	1.039
tracker	        		0.659M	0.142G	0.228G	0.370G	1.608
aop_case_updates		0.016M	0.327G	0.000G	0.327G	0.000
app_apps_1_c			0.675M	0.151G	0.175G	0.326G	1.155
prospect_lists_prospects	0.309M	0.055G	0.092G	0.147G	1.652

We are using it for student relationship management so it’s important for us to keep our data.

I would definitely focus on DBA work to solve this. You have a pretty big database, unless you have matching hardware you’re going to be in trouble with that.

Study which queries are actually running long. Indexes can be improved, try adding some judiciously.

But most importantly, try to clean up the database. That blog post I linked explains some of this.

Start with the easy stuff: tables that have grown huge, and have a LOT of rows with “deleted=1” (do you have the purging job enabled in Schedulers?)

It seems that ever since we’ve upgraded to 7.11 our folders_rel table went from 70k records to over 12 million records.

Ok, but that doesn’t seem to be happening to other people, so you need to try and figure out what is wrong.

Things like:

  • did you actually connect to huge mailboxes?

  • do you have any customization that could be breaking things? Some logic hook that is recursively saving and saving and saving something?

  • Is there something peculiar about your IMAP folder structure? Does any folder have the same name as a subfolder of it?

  • any “all records” Workflows that touch on emails?

etc.

  • We are connected to very large mailboxes. Most of them contain 8.000 to 10,000 emails. This wasn’t a problem before when it would mark the emails as read in the inbox. Deleting this mail is not an option. Is there a way to revert back to the old functionality where it would mark the emails in the inbox as read?
    Looking in the logs we do get some known warning messages such as:
From field is not set.
From address is not set.
From-name is not set.
From name is not set.
From address and name pair format is incorrect, use "from@emailaddress.org <Name of Person>" format.
From address and name pair is invalid.
codes:1, 4, 7, 10, 16, 15

Wed Sep 18 12:00:27 2019 [32025][1][ERROR] Unable to find relationship emails_email_templates
Wed Sep 18 12:00:27 2019 [32025][1][ERROR] Contact address is incorrect to Email: 37aed607-304e-3641-fbbc-5d556cf927bf
Wed Sep 18 12:00:27 2019 [32025][1][ERROR] Unable to find relationship emails_email_templates
Wed Sep 18 12:00:27 2019 [32025][1][ERROR] Saving Email with invalid From name and/or Address. Details:
From field is not set.
From address is not set.
From-name is not set.
From name is not set.
From address and name pair format is incorrect, use "from@emailaddress.org <Name of Person>" format.
From address and name pair is invalid.
codes:1, 4, 7, 10, 16, 15

I’ve seen this error in multiple forum threads with no solutions.

  • We don’t have any logic hooks or anything related to that mail module. We don’t use the actual workflow module.
  • We use gmail with the default “INBOX” folder.

Can you please run these SQL commands from phpMyAdmin, and report the results:

SELECT deleted, count(*) FROM `folders_rel` group by deleted
SELECT folder_id, count(*) FROM `folders_rel` group by folder_id

Query #1: 0,13117664

Query #2:
5fa27e9e-6e48-f36f-53b1-5c6ffa1944cb 3381106
6e3e0e06-e92e-52f4-4eb1-5c05a7f4a47d 371008
74b15f47-6152-5c37-2e90-5c05a486137d 2232527
940b551f-1aec-16c0-35b3-5c05af55ec82 2464161
a9742469-37b8-ec08-0b4e-5c0532469bea 31
aeed2935-4f58-5a2f-5e1f-5c05b0d9734c 1
b6f70000-c785-f8b5-147e-5799408693fd 2391199
d4ac5e82-c654-5d9a-58c9-5c73f3d4f84a 166333
edc6f9ac-919d-a382-d218-5c059ce96423 2112483

If we aren’t checking the actual inboxes from SuiteCRM and are only using it for Cases can we safely remove folders_rel entries from certain boxes after certain amount of days? Is it going to affect looking at the email history from a case.

At the moment I don’t know what folders_rel does, so I am waiting on a colleague to have a look at this, I am hoping he might shed some light.

You might want to have a look at the folders table, to see what those ids (5fa27e9e-6e48-f36f-53b1-5c6ffa1944cb, etc) refer to. Just to see if it makes sense to you.

Meanwhile you can work on simplifying some of the other overgrown tables you have. You should clean up your database. All of it looks like normal work to me (pruning “deleted=1” records, removing orphaned entries), except this folders_rel which is intriguing me and might be a bug.

Hey,

So taking a quick look at this, the folders_rel should contain all of the relationship data for which folder holds which email so I don’t think it would be something that you would be able to safely remove if you still needed to access these emails. If this is a core bug, I would agree that it is likely to be with the recent change to case emails now no longer being set to read.

Also worth noting that folders_rel is, I believe, one of the email tables that doesn’t have its records set to “deleted=1” when you delete an email so you wouldn’t be able to use a scheduler prune here.

To be honest, It’s not exactly clear how we are using the folders_rel stuff here. It seems to be a blank table by default and is only created when an email is sent to a user created folder. I’ll see if I can take a look at replicating this locally on Monday, definitely seems like something strange is going on here.

1 Like

After further investigation, I found the code which read the inbound emails/made cases from it and added logging. It turns out because our inboxes contained 1500-2000 emails per inbox it was taking about an hour to read. This made it look like it was stuck since the case “Date created” gets set to when it starts to run.

I don’t know why you would’ve changed the functionality so that it no longer set mail to read without an option to use the old functionality, I feel like this will just create problems in the future for other people using large inboxes who can’t delete old mail. I fixed the problem by creating a function in the AOPInboundEmail class which takes in the UIDs of the mail that got read and marks them as read. Now everything is running smoothly.