Server running out of resources

wiatrp · 29 May 2020 19:41

I have two servers running for my suitecrm set up that are constantly running out of memory and shutting down if I can’t restart the server in time. My servers have 8 gigs of ram and over time it looks like PHP processes are starting up and not releasing all the memory they are using.

I haven’t been able to determine if I’m having a problem with my SuiteCRM set up or if it’s something else.

I also have a memcached server running on AWS where I store user sessions between the two servers. I’ve noticed that when activity picks up, the server creates tons of new processes, and the number of connections to my memcached server shoots up, which I am assuming is from all the processes spinning up.

In my config file I have these two settings set so that SuiteCRM won’t use this external cache so I’m only saving user sessions

'external_cache_disabled' => true
'external_cache_disabled_memcached' => true

When all this is occurring my php error log is spammed with messages like this

PHP Warning: session_start(): Unable to clear session lock record in /var/www/html/crm/include/MVC/SugarApplication.php on line 615, referer: https://thecrm.com/index.php?module=Home&action=index

I’m wondering if anyone has run into this problem before? It almost seems like I’m running into a cache stampede problem, but SuiteCRM should not be caching anything in this memcached server and there isn’t a ton of traffic for my CRM. We only have around 70 active users total.

In my debug logs I see something like this, which to me indicates that SuiteCRM is not caching externally to memcached. Is this correct?

Fri May 29 05:09:33 2020 [10571][-none-][DEBUG] Found cache backend SugarCacheMemcached
Fri May 29 05:09:33 2020 [10571][-none-][DEBUG] Found cache backend SugarCacheFile
Fri May 29 05:09:33 2020 [10571][-none-][DEBUG] Found cache backend SugarCacheAPC
Fri May 29 05:09:33 2020 [10571][-none-][DEBUG] Found cache backend SugarCacheMemory
Fri May 29 05:09:33 2020 [10571][-none-][DEBUG] Using cache backend SugarCacheMemory, since 999 is less than 1000

pgr · 30 May 2020 09:27

Check your session_savepath value in PHP and go see if that directory is filled up with thousands of files. If it is, you will find pages online explaining what you can do.

Also run the first query in this post, to check for overgrown tables:

wiatrp · 1 June 2020 18:31

My session_savepath is pointing to my AWS memcached server and it doesn’t look like it’s being filled up with files.
I ran the query and one table stands out, meetings_users, that has 35 million records in it.

I made some more progress on the issue this weekend though and found some interesting things. I did not catch any of this before because I thought it was just normal activity during the day. But it turns out this was happening after hours while no one was using the CRM as well.

The access logs were logging constant pings to this endpoint
- “GET /index.php?action=Login&module=Users&login_module=Home&login_action=DynamicAction HTTP/1.1” 302 -
At some point hundreds of processes would spawn on the server and start to take up resources. This same link would then be spammed several times per second.
I noticed in the debug log that one user account had a lot of activity even while they were not using the CRM
- I asked this user to log out of the CRM from all their devices over the weekend and my servers seem to have calmed down now. The access log is not being spammed with this link anymore.

This end point seems to have something to do with dashlets but I haven’t been able to figure out what is going on yet. I still need to test with this users account/computer this week but seems to have fixed something for now after they logged out of the CRM.

Have you ever seen anything like this before?

pgr · 2 June 2020 09:56

Never seen that, no.

I would suspect a brute-force attack… having all those attempts at a login action doesn’t sound too legitimate to me. If you can check the POST parameters that come with each request and see if they trying out different usernames/passwords, that would be a clear demonstration.

I would also do something about the 35M-records table. Maybe you can delete old stuff.

wiatrp · 2 June 2020 15:10

I believe I found the issue. Auto refreshing dashlets were putting a higher strain on my servers. The GET request was the request sent by each dashlet on a users page every 30 seconds.

I was able to test this on my dev environment and with multiple tabs open to a high load dashlet page, I was able to get a constant CPU utilization of 30%+.

I think if this was going on, plus other users using the CRM regularly, and then you get some high resource using process running (like a report one of the cron jobs), then the server resources could get used up pretty quickly.

My average CPU utilization has now gone from 15-30% down to 1-5%.

There is probably a lot I can do to improve the efficiency of some of my queries but for now I just disabled auto refresh and everything is running smoothly.

TLi · 3 June 2020 11:20

Dashlets can be a real pain since their refresh seems to take a lot of resources. It isn’t very smooth process from the user’s perspective either.

On the Meetings table issue, are any of your users using plugins to synchronize external calendars with SuiteCRM’s calendar? We ran into an issue once where a sync plugin was creating new records every time it supposedly synced, deleting old ones and burdening the database. That combined with not purging deleted records too often could cause a records piling up in ridiculous numbers

wiatrp · 3 June 2020 17:12

I’m not actually sure where exactly all these meetings records are coming from. Next on my list is to figure that out and to trim down on the number of records in this table. It is by far the largest number of records out of any table I have.

Out of all 35 millions records only 95 are deleted but I think I can go back and delete old ones from a couple years back

TLi · 3 June 2020 18:12

Either you have a very lively organisation or there’s something odd going on.

In our case the sync just recreated the records and deleted old ones (odd way for a sync to work at all since it messed up timestamps and audit trails). You might be dealing with duplicate entries.

You could also run a scheduler to e.g. delete meetings history after a set amount of time, delete or trim history for now lost accounts etc. so the records won’t just keep stacking.