Is there any problem with UTF8 encoding in C# generated files on latest versions of RO?

Hello,

I have noted in version 1597 and 1599 that the C# generated files (the _Intf at least) from ServiceBuilder apparently set the BOM incorrectly. To begin with, I am not sure why the comment at the top is in spanish, because before, IIRC, it was in english. My OS is in Spanish, but it has been like that always. I have no preference for the comment to be in one or another language, but the encoding is incorrect now. I don’t use special characters for the contents of the service (not even sure if it allows it?), so apart from the comment there shouldn’t be any problems, but trying to open the file from VS gives this error:

---------------------------
Microsoft Visual Studio
---------------------------
File Load

Some bytes have been replaced with the Unicode substitution character while loading file C:\dev\Sistemas\msaCI\1.0\msaCI1Kernel\msaCI_Intf.cs with Unicode (UTF-8) encoding. Saving the file will not preserve the original file contents.
---------------------------
Aceptar   
---------------------------

Is there a way to control this?

This is the starting section of the generated file:

//------------------------------------------------------------------------------
// <auto-generated>
//     Este código fue generado por una herramienta.
//     Versión de runtime:4.0.30319.42000
//
//     Los cambios en este archivo podrían causar un comportamiento incorrecto y se perderán si
//     se vuelve a generar el código.
// </auto-generated>
//------------------------------------------------------------------------------

This was opening the file with ANSI encoding, so it shows the BOM. With UTF encoding it displays the special characters as ? (on this text editor at least).

Previous versions were generated in english. The Runtime version mentioned is the same.

Thanks

Define incorrectly? Can you zip up a bad file and send it to me? thanx!

—marc

Incorrectly as in VS doesn’t like (gives the error mentioned before) and in general, the accented characters in the comment are not displayed correctly in any text editor using UF8 encoding. Those are displayed correctly if viewed as ANSI, but then the bytes at the beginning are shown, as the example posted here.

I’ll send the files by email.

Thanks!

“incorrect” and “VS doesn’t like it” are not the same things :wink:

These files start with EF BB BF (the correct UTF-8 BOM), but already have bad (non-UTF-8) characters encoding in them (I’m assuming your local locale; F3 c corresponds to ó in Latin-1 and Latin-9). IOW, they are marked as UTF-8, but they don’t contain valid UTF-8.

What tool or steps generated the files? Service Builder? rodl2code? The Remoting SDK tools in VS? (and if yes, by which steps)? On .NET level, all strings are UTF-16, something must go wrong in the process of saving this file to disk or passing it to Visual Studio; exact steps for this would be good.

If these files are form Service Builder or rodl2code, I assume the files you sent are as generated, before they have been touched by VS?

As for the Spanish text, this is odd, the CodeGen for .NET uses .NET’s CodeDom classes, which is what adds the comment at the top, AFAIC that has always been localized, but I’m not sure, as I run English everywhere. Nothing changed here on our end, but maybe something changed on the .NET side here, if so, that part is out of our control for now (though we are considering migrating the .NET codeine away from CodeDom to use CodeGen4, like all the other platforms do.

56 65 72 73 69 F3 6E "Versi.n" 
64 65                "de"
72 75 6E 74 69 6D 65 "runtime"

thanx,
marc

Hello Marc,

Thanks for the insight.

The files are as generated by ServiceBuilder, manually saved (“Save code” button). They look fine on the view on ServiceBuilder. The steps are just: load the RODL, CodeGen, C# for .NET, Interface, then Save Code.

I guessed that the comment was being output (and now that you mention, even the code) by the .NET runtime, also for the “runtime version” mentioned. I don’t know if anything changed here, if it did it wasn’t something manually done. Maybe some update or something like that. I do install Windows updates regulalry, and also VS updates and so on. I thought it was something that happened with 1597, as it was when I first noticed this, but this must happen also with previous versions as it doesn’t depend directly on RO for what you say.

My windows setup has “Español (México)” as the language.

In any case, it’s just a small issue easily fixable, just thought to mention it.

Thanks!

Perfect; if the file you sent is the exactly how SB saves, this should be easy tp repro and fix, I hope, as the bug will be localized to SB.

It might also be that it was always Spanish, and you just didn’t notice, while the encoding was not broken…

Good to know, in case this is needed for reproducing the SB issue (probably not; it would be most like in any locale, one might now not notice when everything is English, as all English letters are in ASCII, <127. I assume any non-ascii char (eg ó or ü) in a service or method name, or even just docs, gets broken too… We’ll have a look tomorrow.

thanx!

I have some previously generated _Intf on my repo and they show the comment in english, but I don’t remember what version they were made with.

Now that you mention the documentation: I did experience previously some issue very similar to this one when having non-english characters on the documentation of services. It was this one: D19406, from December last year. I don’t know if it might be related.

Thanks!

Hi,

I think, you have installed something like .NET Framework Language Pack (Spanish). it was a reason why this comment was changed.

as marc already mentioned, we are migrated from Microsoft CodeDom to our CodeGen4 so this issue should go away.