Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: testing unicode and codepage compatibility



Permlink Replies: 8 - Last Post: Oct 19, 2017 1:02 AM Last Post By: Uffe Kousgaard Threads: [ Previous | Next ]
Uffe Kousgaard

Posts: 218
Registered: 2/7/00
testing unicode and codepage compatibility
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 17, 2017 4:20 AM
With code like this I can test, if a certain string can be expressed in
a codepage or not. Is that the most efficient way to do it?

var
s1,s2: string;
a: ansistring;
begin
s1:= 'abcæøåABC';
SetAnsiString(@a,PWideChar(s1),Length(s1),874);
s2:= a;
if s1=s2 then showmessage('good') else showmessage('bad');
end;

I have a lot of strings for which I need to find a matching codepage.
I have a total of 36 codepages to choose between in this specific
context.
Adem Meda

Posts: 495
Registered: 12/28/98
Re: testing unicode and codepage compatibility
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 17, 2017 4:45 AM   in response to: Uffe Kousgaard in response to: Uffe Kousgaard
Uffe Kousgaard wrote:

Is that the most efficient way to do it?

What happens if a string can be valid in more than one code page?

Is that not important for you purposes?
Uffe Kousgaard

Posts: 218
Registered: 2/7/00
Re: testing unicode and codepage compatibility
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 17, 2017 4:54 AM   in response to: Adem Meda in response to: Adem Meda
Adem Meda wrote:


What happens if a string can be valid in more than one code page?

Is that not important for you purposes?

No, I just need a valid one. I am going to test from a list, starting
with no 1.
Adem Meda

Posts: 495
Registered: 12/28/98
Re: testing unicode and codepage compatibility
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 17, 2017 11:12 AM   in response to: Uffe Kousgaard in response to: Uffe Kousgaard
Uffe Kousgaard wrote:

Adem Meda wrote:


What happens if a string can be valid in more than one code page?

Is that not important for you purposes?

No, I just need a valid one. I am going to test from a list, starting
with no 1.

I am assuming you're going to do this once (at least for each data set), in
that case why bother with further optimizing it?

I mean, if we had the data set, I am sure we could help you optimize it to go
faster; but what's the point?
Lajos Juhasz

Posts: 801
Registered: 3/14/14
Re: testing unicode and codepage compatibility
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 17, 2017 11:19 AM   in response to: Uffe Kousgaard in response to: Uffe Kousgaard
Uffe Kousgaard wrote:

With code like this I can test, if a certain string can be expressed
in a codepage or not. Is that the most efficient way to do it?

var
s1,s2: string;
a: ansistring;
begin
s1:= 'abcæøåABC';
SetAnsiString(@a,PWideChar(s1),Length(s1),874);
s2:= a;
if s1=s2 then showmessage('good') else showmessage('bad');
end;

I have a lot of strings for which I need to find a matching codepage.
I have a total of 36 codepages to choose between in this specific
context.

No it's not, you should call directly WideCharToMultiByte. My test code
for this is:

procedure TForm2.Button1Click(Sender: TObject);
var lCodePages: array of Integer;
    lstr: string;
    lCodeInd: integer;
    lBestUsed: bool;
begin
  lstr:=Edit1.text; // some text the cannot be represented in every
code pages
  SetLength(lcodePages, 4);
  lCodePages[0]:=1251;
  lCodePages[1]:=1253;
  lCodePages[2]:=1252;
  lCodePages[3]:=1250;
 
  lCodeInd:=0;
  lBestUsed:=true;
  while (lBestUsed) and (lCodeInd<=high(lCodePages)) do
  begin
    WideCharToMultiByte(lCodePages[lCodeInd],
      WC_COMPOSITECHECK or WC_NO_BEST_FIT_CHARS,
      @lstr[1], - 1, nil, 0, nil, @lBestUsed);
 
    if lBestUsed then
      inc(lCodeInd);
  end;
 
  if lCodeInd<=high(lcodePages) then
  begin
    ShowMessage('Possible codepage:'+IntToStr(lCodePages[lCodeInd]));
  end;
end;


Of course this is not an optimised solution just dirty and quick test.
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: testing unicode and codepage compatibility
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 17, 2017 2:09 PM   in response to: Lajos Juhasz in response to: Lajos Juhasz
Lajos Juhasz wrote:

No it's not, you should call directly WideCharToMultiByte.

Note that Delphi's RTL has a LocaleCharsFromUnicode() function in the
System unit, which wraps WideCharToMultiByte() on Windows.

--
Remy Lebeau (TeamB)
Uffe Kousgaard

Posts: 218
Registered: 2/7/00
Re: testing unicode and codepage compatibility
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 19, 2017 1:02 AM   in response to: Lajos Juhasz in response to: Lajos Juhasz
Lajos Juhasz wrote:

Of course this is not an optimised solution just dirty and quick test.

But works nicely. Thanks.
Peter Below

Posts: 1,227
Registered: 12/16/99
Re: testing unicode and codepage compatibility
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 17, 2017 11:19 PM   in response to: Uffe Kousgaard in response to: Uffe Kousgaard
Uffe Kousgaard wrote:

With code like this I can test, if a certain string can be expressed
in a codepage or not. Is that the most efficient way to do it?

var
s1,s2: string;
a: ansistring;
begin
s1:= 'abcæøåABC';
SetAnsiString(@a,PWideChar(s1),Length(s1),874);
s2:= a;
if s1=s2 then showmessage('good') else showmessage('bad');
end;

I have a lot of strings for which I need to find a matching codepage.
I have a total of 36 codepages to choose between in this specific
context.

You are basically doing a cyclic conversion of Unicode -> Ansi ->
Unicode and expect initial and final Unicode string to be identical.
This usually works, but there may be pathological cases, e.g. with
composite characters. The conversion from ANSI to Unicode and back may
also not work properly if the OS you run on does not have the ANSI
codepages you need installed. That can be a problem with far eastern
code pages (for double-byte character sets) or even with languages like
Hebrew.

If at all possible in your context, simply get rid of Ansi text
completely.

By the way: in you specific sample (string specified as part of the
code) there is in fact an additional conversion done by the compiler,
depending on the encoding you use for your source code in the editor...

--
Peter Below
TeamB
Uffe Kousgaard

Posts: 218
Registered: 2/7/00
Re: testing unicode and codepage compatibility
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 18, 2017 12:27 AM   in response to: Peter Below in response to: Peter Below
Peter Below wrote:

If at all possible in your context, simply get rid of Ansi text
completely.

The output format are industry standard files, which support 36
codepages. I need to find a suitable one. I can not just redefine them
into unicode. Actually a DBF-kind of file with a fixed set of codepages.
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02