7.1: Using Lucene Search with Custom Fields

klou · 2 April 2014 00:23

I really like the performance and look of the new Search (Advanced OpenDiscover).

However, I added a custom field (accounts_cstm) for numeric IDs per account. The Default search can parse this field - how can I add it to the AOD/Lucene index?

Jim · 2 April 2014 14:15

Hi Klou,
Custom fields should be picked up by AOD.

If your account custom field is a text field with numbers in it this should behave fine.

However if the field type is a numeric type (such as integer, decimal e.t.c.) it wont be picked up. You can’t change this behaviour other than by changing the indexing code but this is a small change.

In modules/AOD_Index/AOD_Index.php around line 150 there is a switch statement at the end there is a bunch of fields which get ignored:

          
                case "address":
                case "bool":
                case "currency":
                case "date":
                case "datetimecombo":
                case "decimal":
                case "float":
                case "iframe":
                case "int":
                case "radioenum":
                case "relate":
                default:
                    break;

You can pull out each case that you want to index and move it to after the varchar case.

In your case you would remove


                case "int":

and move it next to the varchar case like:


                case "name":
                case "phone":
                case "html":
                case "text":
                case "url":
                case "varchar":
                case "int":

This will cause all int fields (in all indexed modules) to be indexed.

Hope this helps,
Jim

klou · 2 April 2014 16:34

Thanks for the reply.

Unfortunately, it is already a text field - with the database field type nvarchar(255). So, not an int, and the field is not being indexed on two separate installations/databases.

This is the only custom field that is nvarchar = numeric string - all of the other ones seem to work.

Is there a function that’s parsing the (numeric) string incorrectly?

Jim · 3 April 2014 08:15

The string isn’t parsed it’s just indexed as is. Are other fields in the module being indexed?

If no modules are being indexed then check that AOD is enabled in the settings and that the scheduler is running in Schedulers, in particular check the last run date.

If no field in the module is being indexed then check that the modules vardefs has an entry for ‘unified_search’ and it’s set to true (this can be found in modules/YOURMODULE/vardefs.php or in the custom folders).

If it’s just that one field then it’s possible that this is a bug. If that’s the case then could you please provide a copy of the vardef for the field that isn’t being indexed?

Finally please note that AOD searches are cached for 5 minutes or so. Waiting 5 minutes or slightly changing the search may produce different results.

Thanks,
Jim

klou · 3 April 2014 22:24

Well, here’s the thing: It’s a custom field on the Accounts module. And actually, the vardefs isn’t as helpful.

~/custom/Accounts/Ext/Vardefs/vardefs.ext.php
~/custom/Extension/modules/Accounts/Ext/Vardefs/*.php


... <snip>

$dictionary['Account']['fields']['account_id_c']['unified_search'] = true;

...

$dictionary['Account']['fields']['account_id_c']['labelValue']='Account ID';
$dictionary['Account']['fields']['account_id_c']['type']='varchar';

I added the last line, as it looks like there’s no type definition for custom fields (this one, or any that I’ve added. Only ‘labelValue’).

klou · 4 April 2014 19:01

I opened up the index in /modules/AOD_Index/Index/Index with Luke (v0.99; 4.0.0-ALPHA apparently didn’t like the Lucene version), and there are definitely some odd results.

The “name” records have a 1750 term count (as it parses Accounts/Contacts/Targets), but 4 of my custom fields (varchar) on the account modules only recorded 0,4, 17, 24 counts. If I’m reading this right, this would be unique occurrences (which would explain the low numbers for fields such as Customer Class and Payment Terms), but the big fat 0 on my Account_ID is troubling.

klou · 4 April 2014 20:28

Tracking further, here’s the applicable entry in /cache/modules/Accounts/Accountvardefs.php


'account_id_c' =>
    array (
      'unified_search' => true,
      'labelValue' => 'Account ID',
      'required' => true,
      'source' => 'custom_fields',
      'name' => 'account_id_c',
      'vname' => 'LBL_ACCOUNT_ID',
      'type' => 'varchar',
      'massupdate' => 0,
      'default' => NULL,
      'no_default' => false,
      'comments' => '',
      'help' => '',
      'importable' => 'true',
      'duplicate_merge' => 'enabled',
      'duplicate_merge_dom_value' => 1,
      'audited' => true,
      'reportable' => true,
      'merge_filter' => 'disabled',
      'len' => 255,
      'size' => '20',
      'id' => 'Accounts_account_id_c',
      'custom_module' => 'Accounts',
    ),

Also Accounts snippet from /cache/modules/unified_search_modules.php


  'Accounts' =>
  array (
    'fields' =>
    array (
      'name' =>
      array (
        'query_type' => 'default',
      ),
      'phone' =>
      array (
        'query_type' => 'default',
        'db_field' =>
        array (
          0 => 'phone_office',
        ),
        'vname' => 'LBL_ANY_PHONE',
      ),
      'email' =>
      array (
        'query_type' => 'default',
        'operator' => 'subquery',
        'subquery' => 'SELECT eabr.bean_id FROM email_addr_bean_rel eabr JOIN email_addresses ea ON (ea.id = eabr.email_address_id) WHERE eabr.deleted=0 AND ea.email_address LIKE',
        'db_field' =>
        array (
          0 => 'id',
        ),
        'vname' => 'LBL_ANY_EMAIL',
      ),
      'account_id_c' =>
      array (
        'query_type' => 'default',
      ),
    ),
    'default' => true,
  ),

klou · 4 April 2014 23:33

FWIW, I noticed that there aren’t any numerics in the as an indexed text field. I’ll try to chase this down further on Monday. . .

klou · 7 April 2014 23:23

FIXED.

The problem is that the default Lucene Analyzer does not process numbers. I’ve changed this by setting the default to Common_TextNum in the same getDocumentForBean function mentioned above.


 public function getDocumentForBean(SugarBean $bean){
	// Changed default Analyzer to TextNum
        Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum());


        if($bean->module_name == 'DocumentRevisions'){
            $document = $this->getDocumentForRevision($bean);
        }else{
            $document = array("error"=>false,"document"=>new Zend_Search_Lucene_Document());
        }
        if($document["error"]){
            return $document;
        }
        $document["document"]->addField(Zend_Search_Lucene_Field::UnIndexed("aod_id", $bean->module_name." ".$bean->id));
        $document["document"]->addField(Zend_Search_Lucene_Field::UnIndexed("record_id", $bean->id));
        $document["document"]->addField(Zend_Search_Lucene_Field::UnIndexed("record_module", $bean->module_name));
        foreach($GLOBALS['dictionary'][$bean->getObjectName()]['fields'] as $key => $field){
            switch($field['type']){
                case "enum":
                    $document["document"]->addField(Zend_Search_Lucene_Field::Keyword($key, strtolower($bean->$key)));
                    break;

                case "multienum":
                    $vals = unencodeMultienum($bean->$field);
                    $document["document"]->addField(Zend_Search_Lucene_Field::unStored($key, strtolower(implode(" ",$vals))));
                    break;
                case "name":
                case "phone":
                case "html":
                case "text":
                case "url":
                case "varchar":
                    if(property_exists($bean,$key)){
                        $val = strtolower($bean->$key);
                    }else{
                        $val = strtolower($bean->$key);
                    }
                    $field = Zend_Search_Lucene_Field::unStored($key, $val);
                    if($key == "name"){
                        $field->boost = 1.5;
	 		}
                    $document["document"]->addField($field);
                    break;
                case "address":
                case "bool":
                case "currency":
                case "date":
                case "datetimecombo":
                case "decimal":
                case "float":
                case "iframe":
                case "int":
                case "radioenum":
                case "relate":
                default:
                    break;
            }
        }

        return $document;
    }

http://framework.zend.com/manual/1.12/en/zend.search.lucene.extending.html

It looks like there’s a TextNum_CaseInsenstive analyzer as well, so that could be used instead of some of the string-parsing within the function.

Jim · 8 April 2014 08:11

Hi Klou,

Thanks for the time you’ve taken to track this down, we’ll make sure to add this fix to the next version of AOD.

Thanks,
Jim