Faster UTF-8 decoding in GLib

classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Faster UTF-8 decoding in GLib

Mikhail Zabaluev-3
Hello,

I've made a glib branch where I tried to optimize the UTF-8 decoding routines:
http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=shortlog;h=refs/heads/fast-utf8

The new code uses a table of unrolled functions to decode byte
sequences, dispatched by the first character. g_utf8_get_char() got an
inlined implementation.

I have added a performance test that can also be used against mainline.
Some performance observations with x86, the code compiled by gcc 4.4.1
with optimization flags -O3 -march=core2 and ran on a ThinkPad T61p:

- There is a 15-50% speedup on g_utf8_get_char(), depending on the text.
- g_utf8_to_ucs4_fast() got a similar boost for ASCII, but curiously,
performance has degraded for Chinese text.
- g_utf8_get_char_extended() and g_utf8_get_char_validated()
surprisingly perform better in the present branchy implementation,
compared to my attempted reimplementation using the function tables. I
left them untouched.

I can get measurements on ARM Cortex A9 with a Nokia N900, if there is
enough interest.

Feel free to play and improve.
  Mikhail
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Daniel Elstner
Hi,

Am Dienstag, den 16.03.2010, 17:20 +0200 schrieb Mikhail Zabaluev:

> I've made a glib branch where I tried to optimize the UTF-8 decoding routines:
> http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=shortlog;h=refs/heads/fast-utf8
>
> The new code uses a table of unrolled functions to decode byte
> sequences, dispatched by the first character. g_utf8_get_char() got an
> inlined implementation.

Ouch.  I'm not sure that's such a great idea -- indirect calls usually
completely kill any branch prediction.  I would advise to test on
different CPU types.  Also, table lookups have their downside -- more
cache pressure, GOT needs to be fetched etc.

If you are interested, you may check out the glibmm solution, which gets
away without using any tables whatsoever:

http://git.gnome.org/browse/glibmm/tree/glib/glibmm/ustring.cc#n267

I'd love to have numbers on how my implementation competes with
table-based solutions.  In particular, if you inline the function.  So,
if you have the time... :-)

--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Behdad Esfahbod-3
In reply to this post by Mikhail Zabaluev-3
On 03/16/2010 11:20 AM, Mikhail Zabaluev wrote:
> Hello,
>
> I've made a glib branch where I tried to optimize the UTF-8 decoding routines:
> http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=shortlog;h=refs/heads/fast-utf8

Before any changes are made, can you provide real-world performance profiles
suggesting that UTF-8 decoding is taking a noticeable time of any particular
real-world use case?  If no, I don't see the point.

Cheers,
behdad
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Daniel Elstner
In reply to this post by Daniel Elstner
Hi again,

Am Dienstag, den 16.03.2010, 18:47 +0200 schrieb Daniel Elstner:

> Am Dienstag, den 16.03.2010, 17:20 +0200 schrieb Mikhail Zabaluev:
>
> > The new code uses a table of unrolled functions to decode byte
> > sequences, dispatched by the first character. g_utf8_get_char() got an
> > inlined implementation.
>
> Ouch.  I'm not sure that's such a great idea -- indirect calls usually
> completely kill any branch prediction.  I would advise to test on
> different CPU types.  Also, table lookups have their downside -- more
> cache pressure, GOT needs to be fetched etc.

    static inline gunichar
    g_utf8_get_char_fast (const gchar *p)
    {
      const guchar *up = (const guchar *) p;
      return g_utf8_getter_table[up[0]] (up);
    }

Also, do you realize that have just single-handedly introduced 256 (!)
address references that need to be resolved by the dynamic linker at
library load time? :-D

--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Daniel Elstner
In reply to this post by Behdad Esfahbod-3
Hi,

Am Dienstag, den 16.03.2010, 13:01 -0400 schrieb Behdad Esfahbod:
> >
> > I've made a glib branch where I tried to optimize the UTF-8 decoding routines:
> > http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=shortlog;h=refs/heads/fast-utf8
>
> Before any changes are made, can you provide real-world performance profiles
> suggesting that UTF-8 decoding is taking a noticeable time of any particular
> real-world use case?  If no, I don't see the point.

Well, I would see a point since UTF-8 decoding is a fairly generic
operation.  It cannot hurt to be as fast as possible at this task.
Assuming, of course, that the optimization does not introduce other
costs elsewhere, which I think the proposal unfortunately does.

--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Daniel Elstner
In reply to this post by Daniel Elstner
Hi,

Am Dienstag, den 16.03.2010, 19:38 +0200 schrieb Mikhail Zabaluev:
> 2010/3/16 Daniel Elstner <[hidden email]>:
>
> > Also, do you realize that have just single-handedly introduced 256 (!)
> > address references that need to be resolved by the dynamic linker at
> > library load time? :-D
>
> I've just verified your theory with objdump -r, and it shows the
> references as R_386_32. Do you insist that they are dynamic?

Sorry for the slightly incorrect terminology.  Of course, they are not
dynamic in the sense that they can be loaded from the outside.  But the
table still has to be filled in by the dynamic loader at library load
time as far as I know, because the function pointers are absolute, not
relative addresses.

> The only new dynamic symbol exposed is g_utf8_getter_table, and it's a
> pointer const variable.

Yes, I wasn't talking about 256 exported symbols, but about 256 symbol
references that need to be resolved.  Years ago, Ulrich Drepper wrote a
paper about optimizing shared libraries, which I highly recommend.

--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Daniel Elstner
In reply to this post by Daniel Elstner
Hi,

Am Dienstag, den 16.03.2010, 19:49 +0200 schrieb Mikhail Zabaluev:

> I'm wary of inlining non-trivial code which has some branching in it,
> for the same reasons of cache pressure, killing branch prediction, and
> so on.

Well yes.  That's why I would have liked numbers. :-)  In any case, I
expect my solution to actually compete best if *not* inline, because it
avoids the GOT lookup, which would be another nested function call on
i386.  Without the GOT lookup, it becomes a true leaf function.

--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Behdad Esfahbod-3
In reply to this post by Daniel Elstner
On 03/16/2010 01:18 PM, Daniel Elstner wrote:

> Hi,
>
> Am Dienstag, den 16.03.2010, 13:01 -0400 schrieb Behdad Esfahbod:
>>>
>>> I've made a glib branch where I tried to optimize the UTF-8 decoding routines:
>>> http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=shortlog;h=refs/heads/fast-utf8
>>
>> Before any changes are made, can you provide real-world performance profiles
>> suggesting that UTF-8 decoding is taking a noticeable time of any particular
>> real-world use case?  If no, I don't see the point.
>
> Well, I would see a point since UTF-8 decoding is a fairly generic
> operation.  It cannot hurt to be as fast as possible at this task.
> Assuming, of course, that the optimization does not introduce other
> costs elsewhere, which I think the proposal unfortunately does.

That's one of the worst ideas as far as software goes.  If an operation takes
1% of your application time and you make it 1000 times faster, you know how
much total faster your application would run? 1.01x faster...

That developer time can be put somewhere more useful instead....  Like
optimizing something that is taking 20% time, or 50%, or 70%.

behdad

> --Daniel
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Daniel Elstner
Hi,

Am Dienstag, den 16.03.2010, 14:09 -0400 schrieb Behdad Esfahbod:

> That's one of the worst ideas as far as software goes.  If an operation takes
> 1% of your application time and you make it 1000 times faster, you know how
> much total faster your application would run? 1.01x faster...

Yes, but if it is thousand applications all running 1% faster due to a
single change in a shared library, it becomes a different picture. :-)

> That developer time can be put somewhere more useful instead....  Like
> optimizing something that is taking 20% time, or 50%, or 70%.

But what if the solution is already there because some crazy hackers
held a competition to find the fastest algorithm?  And you are presented
with it on a silver platter?

Seriously, there is a risk of bike-shedding perfectly good contributions
if one takes the economics argument too far.  I'd just wait and see how
it plays out, and if something fruitful comes out of the discussion at
the end, just take it.

--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Philip Van Hoof
In reply to this post by Behdad Esfahbod-3
On Tue, 2010-03-16 at 14:09 -0400, Behdad Esfahbod wrote:

> On 03/16/2010 01:18 PM, Daniel Elstner wrote:
> > Hi,
> >
> > Am Dienstag, den 16.03.2010, 13:01 -0400 schrieb Behdad Esfahbod:
> >>>
> >>> I've made a glib branch where I tried to optimize the UTF-8 decoding routines:
> >>> http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=shortlog;h=refs/heads/fast-utf8
> >>
> >> Before any changes are made, can you provide real-world performance profiles
> >> suggesting that UTF-8 decoding is taking a noticeable time of any particular
> >> real-world use case?  If no, I don't see the point.
> >
> > Well, I would see a point since UTF-8 decoding is a fairly generic
> > operation.  It cannot hurt to be as fast as possible at this task.
> > Assuming, of course, that the optimization does not introduce other
> > costs elsewhere, which I think the proposal unfortunately does.
>
> That's one of the worst ideas as far as software goes.  If an operation takes
> 1% of your application time and you make it 1000 times faster, you know how
> much total faster your application would run? 1.01x faster...

Because in Tracker we need to find word boundaries for indexing of free
text searchable fields, I think UTF-8 performance enhancements would be
a significant improvement for our project.

> That developer time can be put somewhere more useful instead....  Like
> optimizing something that is taking 20% time, or 50%, or 70%.

The developer time has in this case already been committed, though.


Cheers,

Philip

_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Morten Welinder-2
In reply to this post by Behdad Esfahbod-3
> That's one of the worst ideas as far as software goes.  If an operation takes
> 1% of your application time and you make it 1000 times faster, you know how
> much total faster your application would run? 1.01x faster...

Behdad, that line of argument is only valid for code that is used
by one application only -- in the same way saving $1000 in a $1M
project is pointless, but saving $1000 is a thousand projects years
after year does make sense.

This kind of proposal should be debated on the merits.

M.
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Kalle Vahlman
In reply to this post by Behdad Esfahbod-3
2010/3/16 Behdad Esfahbod <[hidden email]>:

> On 03/16/2010 01:18 PM, Daniel Elstner wrote:
>> Well, I would see a point since UTF-8 decoding is a fairly generic
>> operation.  It cannot hurt to be as fast as possible at this task.
>> Assuming, of course, that the optimization does not introduce other
>> costs elsewhere, which I think the proposal unfortunately does.
>
> That's one of the worst ideas as far as software goes.  If an operation takes
> 1% of your application time and you make it 1000 times faster, you know how
> much total faster your application would run? 1.01x faster...
>
> That developer time can be put somewhere more useful instead....  Like
> optimizing something that is taking 20% time, or 50%, or 70%.

This logic has always bothered me, why would it automatically mean
that someone interested in tinkering with UTF-8 operations would
instead do something else useful for Glib? Maybe he'll rather just
tinker away with UTF-8 in some other library?

I can understand if micro-optimizations are debunked with arguments
concerning not touching working code, validity reasons an possible
intrusiveness to internal structure, but not with "I don't think this
is worthwhile so please don't do it even if you want to".

Just rambling here, please don't take it personally! :)

--
Kalle Vahlman, [hidden email]
Powered by http://movial.com
Interesting stuff at http://sandbox.movial.com
See also http://syslog.movial.fi
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Mikhail Zabaluev-3
In reply to this post by Daniel Elstner
Hi,

2010/3/16 Daniel Elstner <[hidden email]>:
>
> In any case, I
> expect my solution to actually compete best if *not* inline, because it
> avoids the GOT lookup, which would be another nested function call on
> i386.  Without the GOT lookup, it becomes a true leaf function.

I have tested your solution as applied to mainline g_utf8_get_char(),
and not inlined except any intra-file optimizations. The results are
for ARM this time.

Mainline:

GTest: run: /utf8/perf/get_char
(MAXPERF:ASCII:      25.7 MB/s)
(MAXPERF:Latin-1:    26.3 MB/s)
(MAXPERF:Cyrillic:   30.8 MB/s)
(MAXPERF:Chinese:    38.2 MB/s)
GTest: run: /utf8/perf/utf8_to_ucs4
(MAXPERF:ASCII:      10.3 MB/s)
(MAXPERF:Latin-1:    11.4 MB/s)
(MAXPERF:Cyrillic:   13.6 MB/s)
(MAXPERF:Chinese:    16.9 MB/s)

Yours:

GTest: run: /utf8/perf/get_char
(MAXPERF:ASCII:      25.6 MB/s)
(MAXPERF:Latin-1:    28.2 MB/s)
(MAXPERF:Cyrillic:   40.1 MB/s)
(MAXPERF:Chinese:    53.9 MB/s)
GTest: run: /utf8/perf/utf8_to_ucs4
(MAXPERF:ASCII:      13.6 MB/s)
(MAXPERF:Latin-1:    15.3 MB/s)
(MAXPERF:Cyrillic:   17.8 MB/s)
(MAXPERF:Chinese:    22.2 MB/s)

My tableware (g_utf8_get_char() inlined):

GTest: run: /utf8/perf/get_char
(MAXPERF:ASCII:      32.1 MB/s)
(MAXPERF:Latin-1:    31.5 MB/s)
(MAXPERF:Cyrillic:   47.0 MB/s)
(MAXPERF:Chinese:    74.9 MB/s)

From the looks of it, some lesser oomph from the non-inlined version,
but consistently better for inlining (I assume without looking that
the optimizer did that in g_utf6_to_ucs4). I think it makes total
sense to use your implementation internally at least.

--
  Mikhail
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Daniel Elstner
Hi,

Am Dienstag, den 16.03.2010, 21:05 +0200 schrieb Mikhail Zabaluev:

> I have tested your solution as applied to mainline g_utf8_get_char(),
> and not inlined except any intra-file optimizations. The results are
> for ARM this time.
[...]
> From the looks of it, some lesser oomph from the non-inlined version,
> but consistently better for inlining (I assume without looking that
> the optimizer did that in g_utf6_to_ucs4). I think it makes total
> sense to use your implementation internally at least.

Rock!
/me jumps up and down, full of joy

I'm a bit surprised that my version was minimally slower than mainline
for the ASCII case.  Given that the GOT lookup is avoided, I would not
have expected that.  Maybe it would be different on i386.

If you have the time, it would also be very cool if we had a direct
comparison between my implementation and yours, with both functions
inline and both functions non-inline.

In any case, thanks a lot for taking the time to profile this.  I never
got around to doing that myself.  The profiling I did myself was limited
to looking at the assembler output and fine-tuning the code accordingly.

Oh, one thing is important to note, if this is indeed considered for
adoption in GLib:  The new implementation will behave differently for
invalid input.  Not that it matters much, since in that case the result
is undefined anyway.  But it's better to be aware of that.  As far as
stability goes, the code has been in place in glibmm for years now --
indeed, it was added before the first stable release of the 2.0 series.

Cheers,
--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Mikhail Zabaluev-3
Hi,

2010/3/16 Daniel Elstner <[hidden email]>:

> Rock!
> /me jumps up and down, full of joy
>
> I'm a bit surprised that my version was minimally slower than mainline
> for the ASCII case.  Given that the GOT lookup is avoided, I would not
> have expected that.  Maybe it would be different on i386.
>
> If you have the time, it would also be very cool if we had a direct
> comparison between my implementation and yours, with both functions
> inline and both functions non-inline.

I could try that, after I take your one to good internal use where it
already shows more effect. But my current tests do not account for any
hidden costs of inlining longish and branched code.

> In any case, thanks a lot for taking the time to profile this.  I never
> got around to doing that myself.

Yes, I pretty much assumed that nobody will look at this stuff unless
there are some numbers to it.
Even besides any other changes, the performance test is a worthy thing to pick.

> Oh, one thing is important to note, if this is indeed considered for
> adoption in GLib:  The new implementation will behave differently for
> invalid input.  Not that it matters much, since in that case the result
> is undefined anyway.

I already made some minor changes to restrict what it produces (like,
c & 0x3f is safer than c - 0x80), and it should pass the test suite
which has a lot of cases for invalid input with some expected output.
My understanding is that unvalidated decoding should also accept
various software's misconstructions of UTF-8 and produce some
meaningful output.

I have some hypothetical explanation of the testing results
(Disclaimer: I'm far from a processor guru with a valid Lauterbach
license). The inlined table-dispatched call trades a PLT call for a
GOT data lookup plus a call to an address in register, which is
essentially the same thing. The fetch prediction actually works quite
all right for runs of text where all characters have same byte length;
this also explains less stellar results for German and Russian texts,
where ASCII and two-byte sequences are mixed. But the optimizer and
the CPU make a better job at loops and branches of more traditional
implementations when they have freedom to use them inline.

--
  Mikhail
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Daniel Elstner
Hi,

Am Dienstag, den 16.03.2010, 22:52 +0200 schrieb Mikhail Zabaluev:

> I already made some minor changes to restrict what it produces (like,
> c & 0x3f is safer than c - 0x80),

No -- this was on purpose!  Using addition and subtraction here instead
of bitwise-and and bitwise-or allows the two operations to be fused into
one LEA instruction on AMD64 and i386.  It is a bit faster if I remember
correctly.

Note that I fine-tuned the code to produce the optimum assembler output.
Please verify any changes to my little baby in the disassembler dump. :)

>  and it should pass the test suite
> which has a lot of cases for invalid input with some expected output.

The test suite tests undefined behavior?  I think it is perfectly fine
to basically return anything here, as long as it does not end up in an
infinite loop or something.  There are dedicated functions for parsing
possibly invalid UTF-8.

> My understanding is that unvalidated decoding should also accept
> various software's misconstructions of UTF-8 and produce some
> meaningful output.

Meaningful in what sense?  And what kind of misconstructions would that
be, for example?  Actually, the Unicode specification demands that UTF-8
decoders not accept any invalid UTF-8 sequences, including even overlong
sequences which can be decoded just fine (due to security concerns).
This is achieved by validating all input first, or by using one of the
safe extraction functions.

> But the optimizer and
> the CPU make a better job at loops and branches of more traditional
> implementations when they have freedom to use them inline.

There may also be an opportunity for some constant-folding and
elimination of dead branches if the code is inlined.  Also, since the
function was not explicitly declared as inline, it could be that it was
inlined in one case but not the other.

--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Mikhail Zabaluev-3
In reply to this post by Daniel Elstner
Hi,

2010/3/16 Daniel Elstner <[hidden email]>:

>> I've just verified your theory with objdump -r, and it shows the
>> references as R_386_32. Do you insist that they are dynamic?
>
> Sorry for the slightly incorrect terminology.  Of course, they are not
> dynamic in the sense that they can be loaded from the outside.  But the
> table still has to be filled in by the dynamic loader at library load
> time as far as I know, because the function pointers are absolute, not
> relative addresses.
>
>> The only new dynamic symbol exposed is g_utf8_getter_table, and it's a
>> pointer const variable.
>
> Yes, I wasn't talking about 256 exported symbols, but about 256 symbol
> references that need to be resolved.  Years ago, Ulrich Drepper wrote a
> paper about optimizing shared libraries, which I highly recommend.

Umm. I had the conception of a DSO being one position-independent blob
with all references made relative, even if basic ELF allows different
segments loaded independently. I don't assume the dynamic loader
should relocate internal function pointers, which are quite commonly
used. A brief readup on venerable Drepper does not contradict that.
But even if you are right, any GObject class writes function pointers
at initialization, and glib proper uses function table stuff quite a
lot too, so 256 more should not make extraordinary impact.

And oddly enough, the ARM compiler does away even with the object file
relocation entries for the table.

--
  Mikhail
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Daniel Elstner
In reply to this post by Mikhail Zabaluev-3
Hi,

Am Dienstag, den 16.03.2010, 22:52 +0200 schrieb Mikhail Zabaluev:

> I could try that, after I take your one to good internal use where it
> already shows more effect. But my current tests do not account for any
> hidden costs of inlining longish and branched code.

Addendum: It's actually not longish at all, even though it may look like
that in the C code.  There are exactly two branches.  I bet that many
macros in GTK+ expand to more than that.

--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Daniel Elstner
In reply to this post by Mikhail Zabaluev-3
Hi,

Am Dienstag, den 16.03.2010, 23:18 +0200 schrieb Mikhail Zabaluev:

> Umm. I had the conception of a DSO being one position-independent blob
> with all references made relative, even if basic ELF allows different
> segments loaded independently.

Impossible.  There are no relative function pointers.  Well, I suppose
you could construct such a beast at the assembler level, but plain old C
function pointers are definitely absolute within their code segment.

> I don't assume the dynamic loader
> should relocate internal function pointers, which are quite commonly
> used. A brief readup on venerable Drepper does not contradict that.
> But even if you are right, any GObject class writes function pointers
> at initialization, and glib proper uses function table stuff quite a
> lot too, so 256 more should not make extraordinary impact.

I remember this to have been a point of concern for the GRegex
implementation.

https://bugzilla.gnome.org/show_bug.cgi?id=50075#c67

The problem should be the same as with the original _pcre_utt table.

> And oddly enough, the ARM compiler does away even with the object file
> relocation entries for the table.

At some point, it needs to fill the table.  And if loading offsets are
randomized for security, there is hardly any room for optimization left.

--Daniel


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Faster UTF-8 decoding in GLib

Mikhail Zabaluev-3
In reply to this post by Daniel Elstner
Hi,

2010/3/16 Daniel Elstner <[hidden email]>:

>> I already made some minor changes to restrict what it produces (like,
>> c & 0x3f is safer than c - 0x80),
>
> No -- this was on purpose!  Using addition and subtraction here instead
> of bitwise-and and bitwise-or allows the two operations to be fused into
> one LEA instruction on AMD64 and i386.  It is a bit faster if I remember
> correctly.
>
> Note that I fine-tuned the code to produce the optimum assembler output.
> Please verify any changes to my little baby in the disassembler dump. :)

I'm afraid "optimum" assembler output != optimal microcode. Only
measurement will decide.

>>  and it should pass the test suite
>> which has a lot of cases for invalid input with some expected output.
>
> The test suite tests undefined behavior?

The API documentation says it's undefined, but the test suite makes it
clear what the actual expectations are. See tests/utf8.txt.

> I think it is perfectly fine
> to basically return anything here, as long as it does not end up in an
> infinite loop or something.

Yes, though we are already in the buffer overflow territory with all
implementations of g_utf8_get_char considered so far.

>> My understanding is that unvalidated decoding should also accept
>> various software's misconstructions of UTF-8 and produce some
>> meaningful output.
>
> Meaningful in what sense?  And what kind of misconstructions would that
> be, for example?

Wikipedia describes a couple:
http://en.wikipedia.org/wiki/UTF-8#UTF-8_derivations

I think it's useful to have functions loose enough to interoperate
with these too, as long as one uses the validating routines for any
untrusted input.

--
  Mikhail
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
12