I’ve been a suiteCRM user for years, and we store a lot of documents in Suite - which end up in the Uploads directory referenced by the non-user friendly names assigned by Suite.
My uploads folder is now at 1.63Gig…and it continues to grow. I’m wondering why these documents aren’t being converted and stored in the Database (mySql or MSSQL - i use the latter). Seems like this would be a more prudent solution and reduce the likelihood of a file becoming corrupted. I’ve searched and don’t see any 3rd party add-ons that would accomplish this.
There doesn’t appear to be a way to purge old/uneeded documents either.
I would love to find a solution to this problem. Anyone else have any suggestions?
As a software engineer and developer, I don’t actually think Documents are better stored inside databases, when compared to a file system.
So don’t worry about where they are stored.
There is a valid case to be made, saying that the “upload” folder should be split into more sub-directories, since eventually the files-per-directory OS limit will be hit. I’ve seen Issues on Github discussing this, and even code hacks to implement it.
You can however work on clearing up “orphaned” Documents, that aren’t referenced by any SuiteCRM records. This might help:
https://pgorod.github.io/How-Documents-Stored/
https://pgorod.github.io/How-Attachments-Stored/
https://pgorod.github.io/How-Photos-Stored/ (if you use photo fields)
Thanks for the feedback. I disagree. In my opinion, when you reach a point of approaching an enterprise application, the documents are better served from MSSQL. SAaS solutions and many other web applications do not store documents in the file system as it is inefficient and becomes unwieldy.
The orphaned documents aren’t the problem - we have very few if any. We store a significant amount of information in custom modules. Some of which we do not need to retain longer than a year. Therefore there is currently not a document storage solution to purge old documents no longer needed or enforce retention policies so you can control the size of the uploads folder.
Splitting into directories doesn’t do anything to control the size, nor the ability to purge - since you have no easy way to determine which documents can be eliminated. The assumption that you can just delete a directory thats older will result in folks deleting files they didn’t intend to, since in this scenario all modules will store in the same date stamped directories.
So i’d like to find a better solution - and would like to see a better document storage solution in SuiteCRM. Any suggestions?
Have a look at plugins in the SuiteCRM Store…
File systems are designed to handle, well, files. They are extremely efficient, thoroughly tested at doing that. They can handle dozens of Terabytes of data. You have all sorts of tools to handle them there.
Relational databases are designed to handle tables with records. A generic file is totally a foreign object to them. The additional layer of complexity that a RDMS adds to a group of Documents really doesn’t help in any way.
I will concede that sometimes a large organization can have a more controlled document system inside a RDMS, but that is not because of efficiency or capacity; on the contrary, it is at the cost of efficiency and capacity. It is simply a matter of making sure nobody is messing with the file system, which is usually more open to different interactions than the RDMS.
What exactly is your problem with your current system? Are you experiencing delays? Are you losing files? Are you worried about security?
Have you considered shrinking your data? You can use file system compression, but also if you have a bunch of scanned documents, a carefully staged image re-sampling campaign, using a specialized tool, can bring down your data size by large percentages (I got everything 60% smaller handling photos).
Anyway, this is an interesting discussion and I can try helping further if I understand what is your actual problem with the oversized upload folder.
Thanks. I thought i explained the issue pretty clearly - but perhaps not.
We store a significant amount of information in custom modules. Some of which we do not need to retain longer than a year. Therefore there is currently not a document storage solution to purge old documents no longer needed in CRM or enforce retention policies so you can control the size of the uploads folder.
Scanned documents aren’t the issue. File sizes aren’t terribly large, it’s the cumulative volume of files. Not to mention the time to backup this server continues to increase because of the size of the uploads folder. We typically are storing PDF/Word and Excel files. PDF’s are computer created documents not scanned. We have a separate document imaging system where we archive scanned or historic documents.
Orphaned documents aren’t the problem - we have very few if any.
Btw, we’ve been using SugarCE since 2011, and were one of the first to migrate to SuiteCRM when it forked. So we have 8 years of documents in this folder - many of which aren’t needed anymore and would normally have been purged by retention policies…but thats currently not an option.
So would something like this work for you:
-
a custom scheduler runs periodically, say, once a month
-
it scans all your Documents records to detect some that match some criteria (older than one year, a certain type of file, etc).
-
deletes those documents from disk and from SuiteCRM
(to be safer you could do it in 2 stages: this month you just move the files to a “recycle bin” dir, and mark the SuiteCRM records as “deleted=1”; next month you really delete them).
That’s a simple custom development. It’s probably even possible with a Workflow, but I would do it with code just have more control and be able to add more checks.
Potentially, however it can’t be global - you would need to be able to choose modules for the related documents so you don’t delete files you need to keep. For instance, i wouldn’t want to delete documents from Contracts or from Opportunities - but some custom modules, we certainly would.
Yes, it makes perfect sense to go by Module, and if you use code you will certainly have a loop over each kind of module Beans.
Do you have PHP developers over there? This shouldn’t be too hard to achieve if they are already familiar with SuiteCMR’s beans.