question

Tomi Kosunen avatar image
1 Like"
Tomi Kosunen asked Tomi Kosunen commented

UTF-8 with filereadline

Hi

Is it so, that command "filereadline()" cannot use all UFT-8 characters? I have always filtered ÄÖäöÅå characters from the input data. Now I noticed that it is possible use those characters "inside" FlexSim, but when I try to read them from text file, it is not succeed.

FlexSim 23.1.0
uft-8
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

1 Answer

Phil BoBo avatar image
0 Likes"
Phil BoBo answered Tomi Kosunen commented

Can you post an example of the issue you are having?

This seems to work fine to me:

1693925271045.png

utf8.fsm

utf8.txt


1693925271045.png (25.6 KiB)
utf8.fsm (24.6 KiB)
utf8.txt (31 B)
· 3
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Tomi Kosunen avatar image Tomi Kosunen commented ·

Thanks @Phil BoBo ! The problem is in the text file, that I'm reading in. It has ANSI encoding as it should be UTF8. It is strange, that Notepad shows the scandinavian characters even if the file has ansi encoding.

Now I have another problem: if the text file is saved with utf8 encoding, the string/character handling is different in FlexSim. Wit your example file, the text.length returns 31 but there are only 17 characters. Also command "text.substr(24,2)" returns only "ä" and "text.substr(24,2)" returns "äö".

I can replace the scandinavian characters out like I have always done, but if there is a simple configuration trick, it would be nice.


0 Likes 0 ·
Jason Lightfoot avatar image Jason Lightfoot ♦♦ Tomi Kosunen commented ·

Can you post your example file with which you're struggling?

Yes there are 17 characters, but .length returns the number of bytes of which there are 31, since some of the characters use more than 1 byte. Here's the documentation. You'll see there that it recommends using the .split() function to create an array of characters to account for these encoding inconsistencies and allow you to access characters by index.

For example in Phil's file the only single byte characters are the space and those in "Test" - the reset are either 3 or 2 bytes:

是 - 3 bytes

ö - 2 bytes


2 Likes 2 ·
Tomi Kosunen avatar image Tomi Kosunen Jason Lightfoot ♦♦ commented ·
0 Likes 0 ·