AOD - Lucene: First index

Hi there,
I activated the cron job for schedulers to work about 21h ago. I started the lucene job by changing the scheduler parameters temporarily. It has been running every since. i.e. it’s been running for about 21h now.

When i load the index stats (/index.php?module=AOD_Index) I see:

Total records	122080
Indexed records	84116
Unindexed records	37964
Failed records	173
Index file count	2
Last Optimised	29.12.2015 16:14

These numbers didn’t change for about 30min (refreshing browser page). Now they seem to be moving along again.

Under Admin>Schedulers>Perform Lucene Index

It says Job Status: “still running”

Questions:

  1. How can i tell whether Lucene is running correctly or if it got stuck somehow?
  2. How can I manually stop the Lucene indexing process?
  3. How can I manually restart / reset the Lucene index?
  4. How long should I expect this first indexing procedure to take?
  5. Does AOD index pdfs? word docs?
  6. Is SuiteCRM AOD / Lucene described anywhere in detail?

thanks!
John

Update:
Lucene Indexing ran for approx. 3 days indexing close to 100’000 records. Then the disk was full and everything stopped (naturally). It had accumulated approx 4.6GB of index files (under /modules/AOD_Index/Index/Index/).

Lesson 1: Makes sure you have lots of extra space on your disk before using AOD!
Lesson 2: In our instance it seems AOD will require approx 6GB of disk space for the indexing files for approx. 120’000 records. First indexing will take approx. 4-5 days. (I suspect our /uploads folder with approx 8GB of files (mostly PDF) is partly responsible for the size, although I still don’t know if/how PDFs are indexed)

Still interested in answers to the questions from my original post, though!

And here’s some more:

  1. Is it ok that the index files mostly belong to user “root”?
  2. Does Lucene automatically purge the index when records and files are removed (deleted) from the CRM instance? (e.g. Emails, attachments, documents, or any database records like Contacts, Accounts, Invoices, etc.)

Hi John,

thanks for your insight into Lucene! I also see PHP processes running for days and days on first indexing and stumbled on your thread.

Concerning your question:
7. Is it ok that the index files mostly belong to user “root”?

I suppose you have your cron job running under user “root” which is a bad idea, because it’s a security risk and files written by a cron job cannot be written again by your apache/NGINX webserver. (for example your suitecrm.log )

I advise you to remove the cronjob install a cron job under the user under which apache/NGINX is running (normally www, www-data or httpd) by:

crontab -u www -e

1 Like