Difference between Utf8String and WideString

rgomezc · October 28, 2025, 7:05pm

Hello,

As the title says: what’s the difference for those 2 string types?

I have a RODL where I am seeing most of the strings are using the WideString type, and in another app I mostly have Utf8String. I don’t know exactly what are the implications of this, specially as for what I can see, both types end using the UnicodeString type in the C++Builder files.

My server apps are made with C++Builder. Clients are a mix of C++Builder and .NET apps.

rgomezc · October 28, 2025, 7:16pm

Ok, so a little trip to the documentation (I always forget it exists) confirms more or less what I thought: the Utf8String will use less space in the wire, even if it ends in the same UnicodeString type on the Client/Server… and that is recommended when one of the sides is a .NET app… other than that, it should be transparent?

When dealing with REST clients, is there any difference or recommendation regarding string types?

mh · October 28, 2025, 9:01pm

Correct. I recommend to always us Utf8String; the only good reasons to still use WideString would be either if you need to support clients or servers on a very old legacy version of Remoting SDK that doesn’t support Utf8String in the message format yet or if you expect the vast amount of texts you transfer not to be be represented as single 8-bit Utf8 characters.

The latter could be the case if you’re sending, say, a majority of Chinese text, or emojis.

In UTF-16 (widestring) most Unicode code points ("letters) fit within one 16-bit character – including accented chars, Chinese, Japanese, etc. Some (eg Emoji) will still be need two characters.

In UTF-8, most Latin characters (eg, English text) will fit within single 8-bit characters., Accents & co (which are usually few and far between, even in e.g French) will be 2*8n = 16-bit surrogate pairs, but Chinese, Japanese, Emoji, or other fancy Unicode characters can require 4 or even more 8-bit characters to represent.

TL;DR: UTF-8 is way more efficient (almost 50% the size) for latin-letter text, but maybe/will be less efficient for text more heavy on non-latin characters.

rgomezc · October 28, 2025, 10:02pm

Thanks for your detailed explanation Marc!

In my case is only latin characters (spanish). So with that in mind I’ll change the WideString entries to Utf8String.