Has anyone setup SuiteCRM to autoscale on AWS

In the documentation for the bitnami SuiteCRM docker image it says:

“If you remove the container all your data and configurations will be lost, and the next time you run the image the database will be reinitialized. To avoid this loss of data, you should mount a volume that will persist even after the container is removed.”

This seems to indicate that when SuiteCRM is installed, at the very least, configuration information is stored on disk. As end users work with SuiteCRM, does any data entered by the end users get stored on disk? If disk only contains configuration information, it seems that SuiteCRM could be installed on an EC2 instance and then an AMI could be created from that EC2 instance. Then, in theory, this AMI could be used to autoscale SuiteCRM.

When a SuiteCRM install is done on a VM, does local disk only contain configuration information? If actual end user created data is stored on disk, setting up redundancy and autoscale will also involve some sort of NAS like EFS, making things more complicated.

Does anyone have experience with redundancy and autoscale on AWS?

SuiteCRM does write information to disk in files, not just in the database.

You can even go into Studio and add modules and custom fields, and SuiteCRM will produce PHP and rewrite itself. This is not ideal for the scenario you envision.

There are ways to scale SuiteCRM a bit, and to use load balancers, so that quite large installations are possible. But they’re not automatically deployed and scaled (to my knowledge).

But this is a nice discussion and I’d love to hear if someone has made progress on this kind of solution.

I was hoping to avoid any sort of NAS, but it looks like we are stuck using NAS.

We have looked at using a suiteCRM docker image. In this case we have to map a volume to the docker container that lives on NAS. We can either map a volume that is on EFS or potentially build a docker volume plugin for s3. (I have no idea how much work this is.) EFS backups are somewhat ugly, but with s3 backup is either not necessary (if you trust s3’s internal replication mechanisms) or easy (copy the s3 bucket to another location). Something I saw on the internet also indicated we could build a docker plugin that would allow the volume to be on EBS, but we have not attempted this.

Another approach would be to run SuiteCRM directly on EC2 instances and each EC2 instance will need to mount an EFS volume. Again, this will lead us into the uglier EFS backup. (There are other ways to mount a volume that is shared between EC2 instances - like mount a volume from another EC2 instance - but these approaches have their downsides.)

And if we try simultaneously running two instances of SuiteCRM that use the same database and NAS, do I have to worry about updates to the data on disk from one instance of SuiteCRM clobbering the updates that the other instance is working on? It sounds like this is something that could happen. At this point having one EC2 with SuiteCRM active at a given time is probably okay. If that instance goes down, and NAS holds the SuiteCRM configuration and data, in theory we can simply spin up a new SuiteCRM instance.

Am I missing anything? I would like to hear the ideas that others have on this topic.

I know SuiteCRM scales to two servers easily by simply splitting web server on one, database on another. Then further up I think the practice is to have more than one web server, with load balancing, but still a single database.

I don’t know of ways to split the database, but I’m not experienced in this area, they may exist.

If you’re big enough to need this sort of architecture, you might prefer going up to SalesAgility and hiring some consultancy from them to help design your architecture. That way you would get things right from the beginning, and leverage their expertise with big installations.

My intention in AWS is to use RDS, which is “serverless” and redundant across availability zones. With AWS you can have a database replica in each availability zone of a region (typically 3 availability zones). It sounds like the web server portion contains the PHP that can be rewritten as custom fields and modules are added. Am I correct?

Yes, the web server contains all the PHP code. If automatic deployments and scaling is a big issue for you (and it is clear it is) you can consider simply limiting the use of Studio and forcing a stable code-base by company policy.

The database is the easiest to deploy automatically, it’s entirely self-contained but harder to scale horizontally, due to the typical difficulties in scaling relational databases. It seems there are technologies for this, only I’ve never used one…

For the sake of completeness - I just remembered there is a table in the database called fields_meta_data which also reflects field customizations. So you’re probably better off stabilizing all these customizations by policy so you can play with deployments as you intend.

Thank you for the information. This is very helpful. I will see if we can live with a stable configuration.

I am pushing fairly hard on the auto scaling and recovery from a lost EC2 instance because it keeps me from getting called to fix things. In addition, if the environment can scale up and down in terms of EC2 instances, I do not have to size a single EC2 instance for peak load. I can then use a smaller EC2 instance, saving money, and then add EC2 instances to meet peak load. When the load drops, the additional EC2 instances can be shut down. Currently our user base will be small, so scaling up to meet peak demand should not be needed, but I definitely want things to automatically recover if an EC2 instances dies. We have a number of things running in AWS and we have built them to recover from failures. As a result we have never been dragged into emergency on-call situations.

Hi @jvergin,

A few years back, I’ve worked on some projects deployed to a cluster environment, just wanted to share some of the problems we’ve faced.

In these projects, we usually had a cluster of two or more web servers and a cluster for the DB (usually a master and a slave). For the web server cluster, we weren’t able to have a shared folder with all the code shared across the nodes. Each node had a copy of the code locally and only the sessions_dir and the uploads_dir were pointing to NFS shared folders. From what I recall, we did this as we were having problems with the cache being written by several nodes at the same time.
Anyway, this was a few years back and maybe these problems don’t happen anymore.

Another thing, we once tried to use ocfs instead of NFS and the whole system performance degraded. I think it was related with ocfs’s syncing mechanism, but not entirely sure. What I know is that we switched back to NFS and performance went back to normal.

Note that, as @pgr also said, this makes it mandatory to “disable” studio / module builder in production, as the code isn’t being shared across nodes. I don’t consider it to be a big problem, as it is always better to do the changes in a dev environment, test them and only after that move them to production.

One last note about docker. I believe this could be done using docker, as you referred, and that docker could make deployment of new versions much easier, by having new versions of the web server image for each production release.

As you can see I’m just a newcomer in this community. So I hope someone will correct me if I’m saying something wrong or if they have a strong opinion that this should be done otherwise.

Hope this helps.

1 Like

Hi,

Have you ever tried to put the database on AWS and the rest of the CRM on another server ?

Thanks.

Hi,

No, I haven’t used the DB on AWS. On the projects I’ve worked on the DB was on a separate server, but not hosted on Amazon.

However, what you asking for should work . If you have your mysql / mariadb running on AWS and it is acessible from your SuiteCRM server instance/nodes. You should be able to connect to that DB. Just have change the config.php configurations.

  • db_host_name
  • db_user_name
  • db_password
  • db_name

Hope this helps.

I’m very interested on any progress made here. I’d like to either deploy as a docker image to AWS elastic container service or using Pivotal Cloud Foundry, and abstract the database to AWS RDS and the file attachments to AWS S3. The RDS thing seems strait forward, although I haven’t seen documentation or examples there. Not be able to abstract file attachments off of the local file system seems like a huge issue in 2018< and looks like it would require not just a new interface, but a fair bit of refactor to the github codebase.

I’m not that familiar with AWS, but is it difficult to tell AWS to keep the “upload” directory in a separate file server, and to link that directory into the filesystem on the main SuiteCRM server?

I would be interested in seeing the local storage pieces abstracted away from direct folder/file manipulation as well as with the cache. This would give the greatest freedom to minimize direct costs in a dynamically scaling solution. One of the challenges with a NAS/NFS styled solution is there is not really a cheap (cost or effort) way to align datasets across regions. I agree in the small looks I have taken into the codebase there appear to be lots of rearchitecture that would be needed to support a more distributed approach.

For what it is worth, the type of approach I have used in migrating codebases towards an optimized scaling stack is to provide interfaces for storage of assets (this could be code, graphics, compiled css, etc) as well as caching. Then I implement adapters for local storage first to ensure things are not broken before adding the complexity of off-server storage. Then I build the adapter to handle upload/download in place of direct read/write to a storage service (AWS S3, or whatever). Then I migrate session to either full DB storage or using a seperate cache service (AWS Elasticache, or whatever). Finally, I migrate the cache mechanisms as appropriate (transient vs “permanent” can go to different places such as redis for transient and S3 for “permanent”). One of the primary benefits of using this approach is that generally there is low to no cost for same region transfers between computer stacks and storage services. Maintaining a NAS/NFS within an elastic block store (EBS) is orders of magnitude more expensive than GET/HEAD/PUT against S3. Lastly, it is very simple to move batches of things stored in S3 (even with very high volumes) across regions quickly which allows better DR and scaling HA given a regional issue or disaster.

We host our suitecrm on AWS and config is done via user data script. Database is on RDS. We mount an EFS and have cache, upload and custom as symbolic links to it. It does seem to work better though if kept to one instance + rds.