German Umlaute with Lazarus

Did some additional Test’s:

With adapted Code:
test.zip (341.6 KB)


Hope this helps
Shalom
Manfred

but you are right, it works if English if set as language for non-Unicode programs and failed with other ones like Russian

The first test I did with German Switzerland setting and it worked also.

lazarus has no proper unicode support :frowning:
it can show Umlautes only if they are present in language for non-Unicode programs. otherwise they are replaced it with ?:

English is set as the language for non-Unicode programs:

Russian is set as the language for non-Unicode programs:

test.zip (176.7 KB)

Hi again…

Ok tried your setting Russian fails like you said.
Tried then ( with Russian ) my App with Zeos Components and it works…

So what can i do??
As in the beginning of the Post described:

It lookl’s like the Combination of Lazarus + RemObjekts do a 2times UTF8.

Hi
Just asking.
Is the topic ( Umalute or Special Characters ) now under investigation or will it not be supported? I think also Russian Characters could benefit from a positive solution. :wink:

Shalom
Manfred

I’ll check this case with non-console Lazarus app, but I’m assured that we don’t have different UTF8 processing for Delphi and Lazarus

Thank you for your Feedback. If i can help / test something please let me know.

I’ve found the problem.
As I said earlier, there is a problem in FPC itself.

open %Lazarus%\fpc\2.6.4\source\packages\fcl-db\src\base\db.pas and look for TDataSet.DataConvert :

procedure TDataSet.DataConvert(aField: TField; aSource, aDest: Pointer;
  aToNative: Boolean);

 // There seems to be no WStrCopy defined, this is a copy of
 // the generic StrCopy function, adapted for WideChar.
 Function WStrCopy(Dest, Source:PWideChar): PWideChar;
 var
   counter : SizeInt;
 Begin
   counter := 0;
   while Source[counter] <> #0 do
   begin
     Dest[counter] := char(Source[counter]); //<<< here should be WideChar instead of char , so it loses all unicode chars
     Inc(counter);
   end;
   { terminate the string }
   Dest[counter] := #0;
   WStrCopy := Dest;
 end;    
...

Hi EvgenyK
Thank you for your Feedback.

In my %Lazarus%\fpc\2.6.4\source\packages\fcl-db\src\base\db.pas
I do not see a

> procedure TDataSet.DataConvert(aField: TField; aSource, aDest: Pointer;
>   aToNative: Boolean);

on Line 1585 ther is:

> procedure DataConvert(aField: TField; aSource, aDest: Pointer; aToNative: Boolean); virtual;

Sorry for the silly question::blush:
Does this belong to db.pas? Because i did also not find generic StrCopy function, :grimacing:

 Function WStrCopy(Dest, Source:PWideChar): PWideChar;
 var
   counter : SizeInt;
 Begin
   counter := 0;
   while Source[counter] <> #0 do
   begin
     Dest[counter] := char(Source[counter]); //<<< here should be WideChar instead of char , so it loses all unicode chars
     Inc(counter);
   end;
   { terminate the string }
   Dest[counter] := #0;
   WStrCopy := Dest;
 end;     

Do i have to manually recompile fpc and or lazarus?

actually, implementation of this method is in %Lazarus%\fpc\2.6.4\source\packages\fcl-db\src\base\dataset.inc on line 532.
you need to recompile this package via starting make in fcl-db folder, after this copy content of fcl-db\units%your platform% into %Lazarus%\fpc\2.6.4\units%your platform%\fcl-db
ofc, lazarus also should be rebuilt

Hi EvgenyK
Did exactly as you said.
modify:

in %Lazarus%\fpc\2.6.4\source\packages\fcl-db\src\base\dataset.inc on line 532.

so it look like this:

procedure TDataSet.DataConvert(aField: TField; aSource, aDest: Pointer;
  aToNative: Boolean);

 // There seems to be no WStrCopy defined, this is a copy of
 // the generic StrCopy function, adapted for WideChar.
 Function WStrCopy(Dest, Source:PWideChar): PWideChar;
 var
   counter : SizeInt;
 Begin
   counter := 0;
   while Source[counter] <> #0 do
   begin
     Dest[counter] := WideChar(Source[counter]);
     Inc(counter);
   end;
   { terminate the string }
   Dest[counter] := #0;
   WStrCopy := Dest;
 end;

Did run make on a clean System:

make in fcl-db folder

copy content of fcl-db\units%your platform% into %Lazarus%\fpc\2.6.4\units%your platform%\fcl-db

then i did rebuild Lazarus.
Did also compile and rebuild my Application.
but still the same result… :pensive:

try to check content of any widestring field like:

function tohex(aStr: WideString): Widestring;
var
  i: integer;
begin
  Result:='';
  for i := 1 to Length(aStr) do
    Result:=Result+ '$'+inttohex(ord(aStr[i]),4);
end;

procedure TForm1.Button2Click(Sender: TObject);
begin
  if test31144_utf8.Active then begin
    memo1.Lines.Add('=======');
    memo1.Lines.Add('WSTR hex = '+tohex(tbl_test31144_utf8.FieldByName('WStr').AsWideString));
  end;
end;

I have

that said that data in TDAMemDataTable is stored correctly, the same as in Delphi, but lazarus controls can’t show UTF16 chars (=widestring) correctly by some reasons


you can change this behavior and store data in TDAMemDataTable in UTF8 format via onReadFieldValue event of bin2 datastreamer.

procedure TForm1.InternalDataStreamerReadFieldValue(const aField: TDAField;
  var Value: Variant);
begin
  if VarIsStr(Value) then Value := VarToWideStr(Value);
end; 

in this case, it will be displayed correctly, except UTF8 encoded string takes a lot more space and it will probably not hold properly in dataset.
in given example, WSTR has widestring(20) field.

as you see, it cuts latest 2 chars:

it can be fixed via increasing field length.

AFAIK, UTF8 encoded data takes up to 6 chars for holding one UTF16 char , so storing 20 UTF16 in lazarus can take up to 120 utf8 chars.

note: you need to adjust every TDAMemDataTable in lazarus manually

Just asking:
Did you do this test with the Lazarus 1.4.2 or 1.5?

lazarus 1.4.2, revision 49524

Hi again…

Did some more testing.
Lazarus 1.4.2 standard installation with no compiler switches at all.

  • Did just “update” my Application with the standard MySQL55Connection and set CharSet to utf8.
    And it just works…
    I understand your Point:

but I’m assured that we don’t have different UTF8 processing for Delphi and Lazarus

And i believe you, but i see that with the builtin mysql components it works and also with Zeos Components,

I like RemObjekt and if this could work i am quite sure other Lazarus Developer ( maybe Russian or Japan or so ) could also profit from this adjustments.

If i could help i would, but i am not a Components Developer …

the main difference between Delphi and Lazarus: Delphi uses UTF16 for unicode support in comparing Lazarus that uses UFT8 …
RO/DA use UTF16 for unicode support because it other our platforms (.NET/Cocoa/etc) also use UTF16.

as a workaround, you can use my suggestion from previous post and manually convert UTF16 into UTF8
another solution is convert UTF16 to UTF8 by request in OnGetText event

procedure TForm1.tbl_test31144_utf8WSTRGetText(Sender: TDACustomField;
  var Text: string; DisplayText: boolean);
begin
  if Sender.DataType = datWideString then Text := UTF8Encode(Sender.asWideString);
end;

in this case, data in grid will be shown correctly, but it will require fix in TDataSet.DataConvert, also manual encoding will be required if you wanted to use UTF16 string in lazarus controls like edit/memo

Just found out something…

     begin
          with DM.RemoteDataAdapter do
          begin
              DataServiceName := 'DataService';
              LoginServiceName := 'LoginService';  //optionaly
              LoginString :=  'User Id="myuser";Password="mypassword";Domain="my_domain";Schema="my_shema";CharSet="utf8";'; 
              TargetURL := 'http://mydomain.ch:7099/bin';
          end;
          Setup_Dataset(Sender);
     end;  

It looks like it does not matter if i set the ;CharSet=“utf8”;’; or not…
Just taught i mention this

Relativity ignores all unknown extra switches.
You need to specify such parameter in MySql.NET connection string inside DASM.
ofc, if this driver supports this parameter …

Hi Evgeny

Thank you for the Tip.
Since i use many Edit, Memo, Date Fields and i use a Report Generator and XLS Export Components i would like a “clean” solution.

the main difference between Delphi and Lazarus: Delphi uses UTF16 for unicode support in comparing Lazarus that uses UFT8 …
RO/DA use UTF16 for unicode support because it other our platforms (.NET/Cocoa/etc) also use UTF16.

If i would have know this in the beginning, i would have pick a different combination.
( Not Lazarus and DataAbstract ). Little bit disappointed… :disappointed:

I still think both Products are very good, but the combination does bring Problems.
If somebody “just” want to do a Phone-book he would immediately struggle with this Problem.
Since many languages use öäüèéà or similar and in Russia and Asia as you know many different “Signs”.

For me this is to point of thinking about a different approach…
Since i come from Delphi Lazarus was my first taught. But of course there are others…

Shalom
Manfred

PS: These Words are meant in a friendly way.