How to handle duplicates on import for migration

I need to work out a way to handle duplicates when I import for the migration form our old system. How would you guys handle something like this?

I have contracts and contacts but with our old system we had to duplicate the contact if they had multiple contracts as we couldn’t assign multiple contracts to a contact.

Example:

[table]
[tr]
[th]Name[/th]
[th]Contact ID[/th]
[th]Contract ID[/th]
[/tr]
[tr]
[td]John Smith[/td]
[td]123[/td]
[td]456[/td]
[/tr]
[tr]
[td]John Smith[/td]
[td]234[/td]
[td]567[/td]
[/tr]
[/table]

I have an excel spreadsheet of all the data but I need to find a way to combine the contacts to one ID. So John Smith would have 2 rows in the spreadsheet with the ID 123 but keeping the contract IDs

Very generic notes on this:

  • You have to decide what is enough to consider two contacts equal: is comparing “Name + Last Name” enough? Do you need to check something else?

  • make a PHP importer script to go through the database and loop around similar Contacts, correctly creating the relationship to Contracts for each one

How many records do you have? The easiest algortihm is a loop inside a loop, you can get into performance problems if there are many. Of course, you can make a better algorithm, but it is more work :slight_smile: