cron.php sessions keep piling up until computer crash

Hello,

I’ve been having this for some time, perhaps since upgrade to 7.10.4 but I hadn’t the time to chase it until now. This morning the machine didn’t crash but had dozens of:

apache    5418  0.0  0.1 120568  3044 ?        Ss   11:44   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1

and, of course, all the cpu cores were at 100%

My only cron tables:

# crontab -l -u apache
* * * * * cd /var/www/mosteiro.mooo.com/htdocs/crm; php -f cron.php > /dev/null 2>&1

# crontab -l -u root
00 4 * * * /sbin/reboot

Nothing relevant on the logs, I think.

It seems that this isn’t working:

# cat /var/www/example.com/htdocs/crm/cron.php
...
// If we have a session left over, destroy it
if(session_id()) {
    session_destroy();
...

Thank you in advance for any help.
jss

Couldn’t find a way to edit my post so, before you ask:

# cat /var/log/messages
...
May  1 11:57:01 mypc crond[5739]: pam_unix(crond:session): session opened for user apache by (uid=0)
May  1 11:57:01 mypc CROND[5743]: (apache) CMD (cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1)
May  1 11:57:16 mypc CROND[5509]: pam_unix(crond:session): session closed for user apache
...

One set every minute, of course.

php_errors.log and suitecrm.log have no reference to the events.

Regards
jss

Strange, I’ve never seen this.

And your “/var/log/messages” seems to indicate that the sessions are getting closed…

In Admin / Scheduler, going into each Job record, do you see “Last run time” as a recent time? I mean: are your jobs running successfully?

1 Like

[quote=“pgr” post=64990]Strange, I’ve never seen this.

And your “/var/log/messages” seems to indicate that the sessions are getting closed…[/quote]
Thank you for helping.

So, this shouldn’t be here:

# ps aux
...
apache    6144  0.0  0.0 120568  1132 ?        Ss   12:13   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6145  0.0  1.5 206332 30816 ?        S    12:13   0:00 php -f cron.php
root      6196  0.0  0.1 154148  3404 ?        S    12:15   0:00 /usr/sbin/CROND
apache    6200  0.0  0.1 120568  2980 ?        Ss   12:15   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6201  0.0  1.5 206332 31200 ?        S    12:15   0:00 php -f cron.php
root      6211  0.0  0.1 154148  3404 ?        S    12:16   0:00 /usr/sbin/CROND
apache    6213  0.0  0.1 120568  2988 ?        Ss   12:16   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6217  0.0  1.5 206332 31172 ?        S    12:16   0:00 php -f cron.php
root      6234  0.0  0.1 154148  3404 ?        S    12:17   0:00 /usr/sbin/CROND
apache    6238  0.0  0.1 120568  3064 ?        Ss   12:17   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6239  0.0  1.5 206332 31108 ?        S    12:17   0:00 php -f cron.php
root      6251  0.0  0.1 154148  3404 ?        S    12:18   0:00 /usr/sbin/CROND
apache    6255  0.0  0.1 120568  2996 ?        Ss   12:18   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6256  0.0  1.5 206332 31012 ?        S    12:18   0:00 php -f cron.php
root      6309  0.0  0.1 154148  3404 ?        S    12:20   0:00 /usr/sbin/CROND
apache    6312  0.0  0.1 120568  3092 ?        Ss   12:20   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6317  0.0  1.5 206332 30708 ?        S    12:20   0:00 php -f cron.php
root      6334  0.0  0.0      0     0 ?        S    12:20   0:00 [kworker/0:1]
root      6339  0.0  0.1 154148  3404 ?        S    12:21   0:00 /usr/sbin/CROND
apache    6343  0.0  0.1 120568  3080 ?        Ss   12:21   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6344  0.0  1.5 206332 31316 ?        S    12:21   0:00 php -f cron.php
root      6354  0.0  0.1 154148  3404 ?        S    12:22   0:00 /usr/sbin/CROND
apache    6356  0.0  0.1 120568  3072 ?        Ss   12:22   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6359  0.0  1.5 206332 31004 ?        S    12:22   0:00 php -f cron.php
root      6370  0.0  0.1 154148  3404 ?        S    12:23   0:00 /usr/sbin/CROND
apache    6373  0.0  0.1 120568  3116 ?        Ss   12:23   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6376  0.0  1.5 206332 30816 ?        S    12:23   0:00 php -f cron.php
root      6410  0.0  0.1 154148  3404 ?        S    12:24   0:00 /usr/sbin/CROND
apache    6411  0.0  0.1 120568  3108 ?        Ss   12:24   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6412  0.0  1.5 206332 31288 ?        S    12:24   0:00 php -f cron.php
root      6425  0.0  0.1 154148  3404 ?        S    12:25   0:00 /usr/sbin/CROND
apache    6429  0.0  0.1 120568  3076 ?        Ss   12:25   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6430  0.0  1.5 206332 31008 ?        S    12:25   0:00 php -f cron.php
root      6441  0.0  0.1 154148  3404 ?        S    12:26   0:00 /usr/sbin/CROND
apache    6443  0.0  0.1 120568  3064 ?        Ss   12:26   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
apache    6445  0.0  1.5 206332 30976 ?        S    12:26   0:00 php -f cron.php
root      6456  0.0  0.1 154148  3404 ?        S    12:27   0:00 /usr/sbin/CROND
apache    6458  0.0  0.1 120568  3068 ?        Ss   12:27   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1
# on and on and on, dozens...

The first six, didn’t check more:


Last Successful Run:  Never

Any leads?
TIA
jss

Maybe increase your log level to DEBUG in Admin / System Settings and at a time when nobody else is using the system, wait for the cron jobs to be launched and see what the logs say (suitecrm.log). Grep them for “cron” and for “job”.

If PHP is launching cron.php you should see messages there, and possibly some indication of which job is failing. If the suitecrm.log doesn’t have anything, then focus your attention on your crontab command, it would seem cron.php is not even starting.

1 Like

Thank you again.
I renamed suitecrm.log to old, just to start fresh, rebooted and, few minutes later:

# grep -i cron suitecrm.log
Tue May  1 13:31:01 2018 [4920][1][DEBUG] --------------------------------------------> at cron.php <--------------------------------------------
Tue May  1 13:31:01 2018 [4920][1][DEBUG] Using SugarCronJobs as CRON driver
Tue May  1 13:32:02 2018 [5055][1][DEBUG] --------------------------------------------> at cron.php <--------------------------------------------
Tue May  1 13:32:02 2018 [5055][1][DEBUG] Using SugarCronJobs as CRON driver
Tue May  1 13:33:02 2018 [5142][1][DEBUG] --------------------------------------------> at cron.php <--------------------------------------------
Tue May  1 13:33:02 2018 [5142][1][DEBUG] Using SugarCronJobs as CRON driver
Tue May  1 13:34:02 2018 [5164][1][DEBUG] --------------------------------------------> at cron.php <--------------------------------------------
Tue May  1 13:34:02 2018 [5164][1][DEBUG] Using SugarCronJobs as CRON driver
Tue May  1 13:34:52 2018 [4920][1][INFO] Query:UPDATE job_queue SET status='running', date_modified='2018-05-01 13:31:01', client='CRONac3a8b743cb7c44f50416d6d3772ce34:4920' WHERE id='eaa76474-9317-92e9-fd5d-5ae80896aa54' AND status='queued'
					SET `assigned_user_id`='a54c0e4b-7fb7-2014-e104-5a95b16e9b5f',`name`='Run Report Generation Scheduled Tasks',`date_modified`='2018-05-01 13:31:01',`scheduler_id`='7d075828-12c1-8a61-9836-5ad73f637e60',`execute_time`='2018-05-01 04:30:00',`status`='running',`resolution`='queued',`message`=NULL,`target`='function::aorRunScheduledReports',`data`=NULL,`requeue`=0,`retry_count`=NULL,`failure_count`=NULL,`job_delay`=0,`client`='CRONac3a8b743cb7c44f50416d6d3772ce34:4920',`percent_complete`=NULL
					SET `assigned_user_id`='a54c0e4b-7fb7-2014-e104-5a95b16e9b5f',`name`='Run Report Generation Scheduled Tasks',`date_modified`='2018-05-01 13:31:01',`scheduler_id`='7d075828-12c1-8a61-9836-5ad73f637e60',`execute_time`='2018-05-01 04:30:00',`status`='done',`resolution`='success',`message`='Warning [2]: Invalid argument supplied for foreach() in /var/www/example.com/htdocs/crm/modules/Schedulers/_AddJobsHere.php on line 727\n',`target`='function::aorRunScheduledReports',`data`=NULL,`requeue`=0,`retry_count`=NULL,`failure_count`=NULL,`job_delay`=0,`client`='CRONac3a8b743cb7c44f50416d6d3772ce34:4920',`percent_complete`=NULL
Tue May  1 13:35:02 2018 [5189][1][DEBUG] --------------------------------------------> at cron.php <--------------------------------------------
Tue May  1 13:35:02 2018 [5189][1][DEBUG] Using SugarCronJobs as CRON driver
Tue May  1 13:36:02 2018 [5232][1][DEBUG] --------------------------------------------> at cron.php <--------------------------------------------
Tue May  1 13:36:02 2018 [5232][1][DEBUG] Using SugarCronJobs as CRON driver
Tue May  1 13:36:20 2018 [5055][1][INFO] Query:UPDATE job_queue SET status='running', date_modified='2018-05-01 13:32:01', client='CRONac3a8b743cb7c44f50416d6d3772ce34:5055' WHERE id='d4928ca6-aa2b-c616-5da6-5ae80c366d42' AND status='queued'
					SET `assigned_user_id`='a54c0e4b-7fb7-2014-e104-5a95b16e9b5f',`name`='Run Report Generation Scheduled Tasks',`date_modified`='2018-05-01 13:32:01',`scheduler_id`='7d075828-12c1-8a61-9836-5ad73f637e60',`execute_time`='2018-05-01 04:31:00',`status`='running',`resolution`='queued',`message`=NULL,`target`='function::aorRunScheduledReports',`data`=NULL,`requeue`=0,`retry_count`=NULL,`failure_count`=NULL,`job_delay`=0,`client`='CRONac3a8b743cb7c44f50416d6d3772ce34:5055',`percent_complete`=NULL
					SET `assigned_user_id`='a54c0e4b-7fb7-2014-e104-5a95b16e9b5f',`name`='Run Report Generation Scheduled Tasks',`date_modified`='2018-05-01 13:32:01',`scheduler_id`='7d075828-12c1-8a61-9836-5ad73f637e60',`execute_time`='2018-05-01 04:31:00',`status`='done',`resolution`='success',`message`='Warning [2]: Invalid argument supplied for foreach() in /var/www/example.com/htdocs/crm/modules/Schedulers/_AddJobsHere.php on line 727\n',`target`='function::aorRunScheduledReports',`data`=NULL,`requeue`=0,`retry_count`=NULL,`failure_count`=NULL,`job_delay`=0,`client`='CRONac3a8b743cb7c44f50416d6d3772ce34:5055',`percent_complete`=NULL
Tue May  1 13:37:02 2018 [5289][1][DEBUG] --------------------------------------------> at cron.php <--------------------------------------------
Tue May  1 13:37:02 2018 [5289][1][DEBUG] Using SugarCronJobs as CRON driver
Tue May  1 13:37:25 2018 [5142][1][INFO] Query:UPDATE job_queue SET status='running', date_modified='2018-05-01 13:33:01', client='CRONac3a8b743cb7c44f50416d6d3772ce34:5142' WHERE id='ccec185a-d71e-6859-6052-5ae80b6d793d' AND status='queued'
					SET `assigned_user_id`='a54c0e4b-7fb7-2014-e104-5a95b16e9b5f',`name`='Run Report Generation Scheduled Tasks',`date_modified`='2018-05-01 13:33:01',`scheduler_id`='7d075828-12c1-8a61-9836-5ad73f637e60',`execute_time`='2018-05-01 04:32:00',`status`='running',`resolution`='queued',`message`=NULL,`target`='function::aorRunScheduledReports',`data`=NULL,`requeue`=0,`retry_count`=NULL,`failure_count`=NULL,`job_delay`=0,`client`='CRONac3a8b743cb7c44f50416d6d3772ce34:5142',`percent_complete`=NULL
					SET `assigned_user_id`='a54c0e4b-7fb7-2014-e104-5a95b16e9b5f',`name`='Run Report Generation Scheduled Tasks',`date_modified`='2018-05-01 13:33:01',`scheduler_id`='7d075828-12c1-8a61-9836-5ad73f637e60',`execute_time`='2018-05-01 04:32:00',`status`='done',`resolution`='success',`message`='Warning [2]: Invalid argument supplied for foreach() in /var/www/example.com/htdocs/crm/modules/Schedulers/_AddJobsHere.php on line 727\n',`target`='function::aorRunScheduledReports',`data`=NULL,`requeue`=0,`retry_count`=NULL,`failure_count`=NULL,`job_delay`=0,`client`='CRONac3a8b743cb7c44f50416d6d3772ce34:5142',`percent_complete`=NULL
Tue May  1 13:38:02 2018 [5313][1][DEBUG] --------------------------------------------> at cron.php <--------------------------------------------
Tue May  1 13:38:02 2018 [5313][1][DEBUG] Using SugarCronJobs as CRON driver
Tue May  1 13:38:26 2018 [5164][1][INFO] Query:UPDATE job_queue SET status='running', date_modified='2018-05-01 13:34:02', client='CRONac3a8b743cb7c44f50416d6d3772ce34:5164' WHERE id='ae68d22d-c350-d024-3c56-5ae80d1863d9' AND status='queued'
					SET `assigned_user_id`='a54c0e4b-7fb7-2014-e104-5a95b16e9b5f',`name`='Run Report Generation Scheduled Tasks',`date_modified`='2018-05-01 13:34:02',`scheduler_id`='7d075828-12c1-8a61-9836-5ad73f637e60',`execute_time`='2018-05-01 04:33:00',`status`='running',`resolution`='queued',`message`=NULL,`target`='function::aorRunScheduledReports',`data`=NULL,`requeue`=0,`retry_count`=NULL,`failure_count`=NULL,`job_delay`=0,`client`='CRONac3a8b743cb7c44f50416d6d3772ce34:5164',`percent_complete`=NULL
					SET `assigned_user_id`='a54c0e4b-7fb7-2014-e104-5a95b16e9b5f',`name`='Run Report Generation Scheduled Tasks',`date_modified`='2018-05-01 13:34:02',`scheduler_id`='7d075828-12c1-8a61-9836-5ad73f637e60',`execute_time`='2018-05-01 04:33:00',`status`='done',`resolution`='success',`message`='Warning [2]: Invalid argument supplied for foreach() in /var/www/example.com/htdocs/crm/modules/Schedulers/_AddJobsHere.php on line 727\n',`target`='function::aorRunScheduledReports',`data`=NULL,`requeue`=0,`retry_count`=NULL,`failure_count`=NULL,`job_delay`=0,`client`='CRONac3a8b743cb7c44f50416d6d3772ce34:5164',`percent_complete`=NULL
Tue May  1 13:39:02 2018 [5335][1][DEBUG] --------------------------------------------> at cron.php <--------------------------------------------
Tue May  1 13:39:02 2018 [5335][1][DEBUG] Using SugarCronJobs as CRON driver

I guess cron.php is being called but I don’t quite know where to start.

Any leads, please?
TIA,
jss

Yep it seems to be running.

“Info” and “Debug” messages are not really errors, they’re safe to ignore, so if you don’t have anything else in there, this won’t be helpful.

What about php_errors.log, or whatever your Apache log is called? Do you see anything relevant there?

You don’t happen to have any external service launching cron jobs for you? This is recommended in some tutorials (not by me), did you ever attempt that?

1 Like

[quote=“pgr” post=65003]Yep it seems to be running.

“Info” and “Debug” messages are not really errors, they’re safe to ignore, so if you don’t have anything else in there, this won’t be helpful.

What about php_errors.log, or whatever your Apache log is called? Do you see anything relevant there?[/quote]

No, but I think I got it, read on.

No.

But it seems stable now, for about 10 minutes, with just 4 to 6 tasks running.

I think I can correlate this with my changes to schedules that run “As often as possible” (1 minute on cron), where I unchecked “Execute if missed” (I don’t remember having touched it ever, it must have come with the original config).

My theory is that there were a lot of missed schedules all piling-up because they happen every minute, causing the system to miss more and more schedules every minute until the exhaustion and crash. I’m not sure of this, though. But the system is stable now, let’s wait for a full day to be sure.

Thank you for your extended support.

Best regards,
jss

Ok, so another thing you might want to investigate is whether some of your jobs, for some reason, are taking a long time (longer than one minute, or even more).

Sometimes database performance degradates heavily.

If you execute the query on top of this post

https://pgorod.github.io/Database-tables-size/

You can get a sense of overgrown tables, which I usually find are associated with lagging systems. Especially job_queue, reminders and aod_indexevent.

You can start by turning on the “Prune database job”.

If you’re going to mess with the database, make a good backup first, just in case.

1 Like

[EDIT] Sorry, my post just crossed yours, I’ll read it now and reply below.

It seems that I was not right, at least completely:

# top -b -n1 -u mysql
top - 17:44:40 up  1:11,  4 users,  load average: 12.52, 11.99, 10.43
Tasks: 201 total,   1 running, 200 sleeping,   0 stopped,   0 zombie
%Cpu(s): 89.4 us,  4.9 sy,  0.0 ni,  4.8 id,  0.1 wa,  0.0 hi,  0.9 si,  0.0 st
KiB Mem :  2046360 total,   575248 free,   722084 used,   749028 buff/cache
KiB Swap:  2047996 total,  2047996 free,        0 used.  1248220 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 4283 mysql     20   0 2429996 270496  16012 S 334.6 13.2 262:34.33 mysqld

mysql is occupying the 4 cpu’s with 334.6%. It has about 20 active threads plus some spleeping. I don’t have running processes that use mysql, at least on a relevant way.
This is the root process of mysql:

4283 ?        Ssl  291:17 /usr/sbin/mysqld --defaults-file=/etc/mysql/my.cnf

cron.php now has 16 processes running.

apache    6145  0.0  1.5 206332 30816 ?        S    12:13   0:00 php -f cron.php
root      6196  0.0  0.1 154148  3404 ?        S    12:15   0:00 /usr/sbin/CROND
apache    6200  0.0  0.1 120568  2980 ?        Ss   12:15   0:00 /bin/sh -c cd /var/www/example.com/htdocs/crm; php -f cron.php > /dev/null 2>&1

Any hunches?

TIA,
jss

Yeah, run that query I mentioned in my last post. My guess at this moment is that you have a performance problem around some query running from the schedulers.

Right:


Table					rows		Data		idx		size		idxfrac
suitecrm.job_queue			0.013M	0.153G	0.329G	0.482G	2.147
suitecrm.email_addresses	0.009M	0.002G	0.003G	0.005G	1.205
suitecrm.relationships		0.000M	0.003G	0.001G	0.005G	0.431
nextcloud.oc_authtoken		0.000M	0.001G	0.002G	0.003G	1.052
suitecrm.email_cache		0.002M	0.001G	0.001G	0.003G	0.866
nextcloud.oc_filecache		0.004M	0.001G	0.001G	0.003G	0.722
nextcloud.oc_activity		0.003M	0.001G	0.001G	0.002G	0.412
nextcloud.oc_calendarobjects	0.001M	0.001G	0.000G	0.002G	0.062
mysql.innodb_index_stats	0.003M	0.001G	0.000G	0.001G	0
mysql.time_zone_transition	0.040M	0.001G	0.001G	0.001G	1.113

Sorry, can’t get the format right, it’s not wysiwyg.

It seems I don’t have reminders and aod_indexevent and the only relevant size is job_queue with 153MB.
I remember once having to delete a large part of this table, everything but the current day, but it keeps growing.

rgds,
jss

Risking to look like a reality show, I think I got it now.

  • Deleted all rows of job_queue marked done and earlier than yesterday.
  • Optimized the table job_queue, recovering the space and re-ordering
  • Rescheduled the interval of “Every 1 minute” schedules to “Every 5 minutes”

The process list is now completely clean. Sometimes a cron.php pops off, takes 60% of cpu but quickly finishes and everything is normal again.

I really don’t need to be reminded of sending an email within one minute but I think I could change it to 2 minutes without problems.

Furthermore, it could eventually be the sync of imap mailboxes to cause the problem, with the time needed for that which could easily be longer than 1 minute.

Thank you for your support.

Best regards,
jss

As an afterthought about what happened here, I would like to say that I think the policy to launch a new scheduled job should include the condition that one job of the same kind is not running. This should be logged, obviously.

In fact, specifically if it is marked “As often as possible”, it is not possible.

Ignoring this, possibly leads to machine session suicide (or homicide) occupying 99% of the resources and blocking all other applications, as happened here. Eventually the machine crashes if this application blindly insists, as happened here.

Please note that the machine in question is not the weakest in the world; it’s a 2-core, 4-thread, 1.8GHz cpu, with SSD and 2GB RAM, which is more than enough for a headless server (no X).

Best regards,
jss

I believe that the sort of logic you ask for is already there. I have never seen a case like yours, with this impact on unclosed sessions, although I have seen about a dozen cases of servers with performance problems like the one you had in the database.

That logic, I think should be improved: there’s no reason why the job_queue table should create problems for people.

Anyway, I’m glad you got it working.

1 Like

Yes, with your unvaluable help, thank you once again.

Regards,
jss

1 Like