utf8 and Glib::ustring

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

utf8 and Glib::ustring

John Emmas-2
Forgive my ignorance - this'll probably be obvious to some of you...

Suppose I've got a simple character string, like this:-

       const char* my_str = "Hello World";

I can assign it to a Glib::ustring very easily:-

       Glib::ustring ustr = my_str;

BUT... instead of pointing to a 'normal' string (simple ASCII
characters), let's suppose that 'my_str' was already pointing to a
string in utf8 format.  Will the same assignment still work - or is
there some better way of assigning a utf8 string to a Glib::ustring?  
Thanks,

John
_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: utf8 and Glib::ustring

Daniel Boles
On 22 March 2017 at 08:52, John Emmas <[hidden email]> wrote:
Forgive my ignorance - this'll probably be obvious to some of you...

Suppose I've got a simple character string, like this:-

      const char* my_str = "Hello World";

I can assign it to a Glib::ustring very easily:-

      Glib::ustring ustr = my_str;

BUT... instead of pointing to a 'normal' string (simple ASCII characters), let's suppose that 'my_str' was already pointing to a string in utf8 format.  Will the same assignment still work - or is there some better way of assigning a utf8 string to a Glib::ustring?  Thanks,

John


UTF-8 is backwards compatible with ASCII. If bit 7 of any given byte in a string is 0, then that byte is treated as ASCII. Only if bit 7 is 1 do UTF-8-compatible tools start interpreting the lower bits and the following bytes differently.

In the same way, to Glib::ustring, any char* is just a block of bytes for it to interpret as ASCII or as the extended set of characters supported by UTF-8. (This typically manifests as different behaviour when getting the string length, indexing, etc.: there is no longer a 1:1 correspondence between size in bytes and length in characters when UTF-8 encoding is in play.)

IOW, the answer to the question is yes, the same assignment will/must work, and no, there is no better way: construct the Glib::ustring from the char* and let it handle the rest.


_______________________________________________
gtkmm-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gtkmm-list
Loading...