FYI: better UTF8 decoder.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

FYI: better UTF8 decoder.

Butrus Damaskus
Hi!

This page: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ claims to
have better (quicker and smaller?) utf8 decoder. Maybe it would be
worth to look at it?

BBD
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: FYI: better UTF8 decoder.

Behdad Esfahbod-3
On 04/13/2009 05:00 AM, Butrus Damaskus wrote:
> Hi!
>
> This page: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ claims to
> have better (quicker and smaller?) utf8 decoder. Maybe it would be
> worth to look at it?

Funny how he claims "reduced complexity".  That's definitely the most complex
UTF-8 decoder I've seen.

Anyway, as I said on my own UTF-8 decoding post [1], not worth changing glib
unless someone shows a real profile of a real application with UTF-8 decoding
taking a measurable part of the total run time.

behdad

[1] http://mces.blogspot.com/2008/04/utf-8-bit-manipulation.html

> BBD
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: FYI: better UTF8 decoder.

Daniel Elstner
Am Montag, den 13.04.2009, 21:26 -0400 schrieb Behdad Esfahbod:
> On 04/13/2009 05:00 AM, Butrus Damaskus wrote:
> > Hi!
> >
> > This page: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ claims to
> > have better (quicker and smaller?) utf8 decoder. Maybe it would be
> > worth to look at it?
>
> Funny how he claims "reduced complexity".  That's definitely the most complex
> UTF-8 decoder I've seen.

I agree.  So, as we are now comparing each others UTF-8 algorithms, I
thought I would show off mine too ;-)

http://git.gnome.org/cgit/glibmm/tree/glib/glibmm/ustring.cc#n270

This has been in use for years now.  Just as g_utf8_get_unichar(), it is
not meant to cope with invalid UTF-8.  Its strong point is that you do
not need a table at all, thereby avoiding the invisible function call to
fetch the global offset table pointer, if the code is part of a shared
library.

> Anyway, as I said on my own UTF-8 decoding post [1], not worth changing glib
> unless someone shows a real profile of a real application with UTF-8 decoding
> taking a measurable part of the total run time.

Agreed.  We only have our own implementation in glibmm because we needed
it to work directly with std::string iterator instead of a plain
pointer.

--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list