Exporting special characters (e.g. Umlauts) produces control characters in CSV file

Hello,

I have had this problem since version 7 and it continuous to persist in version 8.8:

When exporting characters such as ‘ä, ö, û’ they show up as ‘√§, √∂, √º’ in the respective CSV export. Within SuiteCRM, the representation is correct.

The Import/Export Character Set is UTF-8 (I tried different sets and the control characters change but it is never correct).

I recently migrated from 7.14 to 8.8.0 and as part of the migration process ensured that the character sets and collation used throughout SuiteCRM are set utf8mb3 / utf8mb3_general_ci.

Any suggestions?

Many thanks!

Maybe you will find answer if you check topics on this forum.

https://community.suitecrm.com/search?q=character

None of them are really providing a workable solution.

Reading about it, it seems that MySQL’s UTF-8 implementation is different from the standard implementation and that could be the reason for the problems when interfacing with PHP’s implementation of UTF-8.

My MySQL DB is set to utf8mb3. I only found out later that this is apparently deprecated and one should use utf8mb4.

In order to test this. I have to migrate my installation from mb3 to mb4. My first steps didn’t take me far. Adjusting the MySQL DB to mb4 and changing the db connection to mb4 in config_override.php led to the unavailability of the CRM…

Any ideas to make the transition?

Cheers

Thanks rsp. But I am using the exact same collation (utf8mb3_general_ci). Read on till the end…

I have performed further tests:

  • Looking through a MySQL tool (I use Navicat) directly at my tables, the Umlauts show up correctly.
  • If I export them through Navicat into csv or txt, they show up with the same control characters.
  • This leads me to believe, that the problem may be more on the MySQL side than with SuiteCRM
  • From the SQL dump of my db, I noticed that the initial SET NAMES statement is using utf8mb4 (SET NAMES utf8mb4;) whereas all individual tables are using utf8mb3 (ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;). I have no idea where the initial mb4 comes from. According to the MySQL documentation it ensures that all characters are recognised but has no further impact.

My next thought was: what about if I change the encoding of the DB to utf8mb4?
I created a new database on my server (AWS RDS) with Character Set utf8mb4 and Collation utf8mb4_general_ci. I added a table and one record containing umlauts. However, exporting that record resulted in exactly the same control characters!

The weird thing is that if I dump the data, the umlauts are there…

OK, so far, my workflow has been Export into csv file, open up csv file in MS Excel to further analyse, etc. In this post, I found a reference to the problem: Solved: UTF-8 problem in mySQL | Experts Exchange. It turns out, that Excel is simply using ANSI as the character set of csv files and it is Excel which garbles the input.

NOTHING TO DO WITH SUITECRM OR MYSQL. IT’S BLooDY MICROSOFT!

This post describes how to import utf8 encoded files into MS Excel: How to import a .csv file that uses UTF-8 character encoding - ITG Computing Support | Institute for Advanced Study

May it save time for the next poor soul…

2 Likes