Speeding up thumbnail generation (like multi threaded). Thoughts please.

classic Classic list List threaded Threaded
56 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Speeding up thumbnail generation (like multi threaded). Thoughts please.

Bugzilla from markg85@gmail.com
Hi,

Before you read on. I wasn't sure if this needs to go to the gtk-devel
or the nautilus mailing lists. I did gtk for now. I hope that's the
right list for things like this.

It has been a few years ago since i made a topic about this (on the
nautilus list)  with the title (rough guess) "nautilus thumbnail
generation slow".
Back then i didn't know shit about that stuff and i was thinking
nautilus used GD to generate thumbnails. Oh men was i wrong there.

Now, a few years later, i have some c and c++ knowledge and i
benchmarked the thumbnail generation of glib against GraphicsMagick
and FreeImage. no QT yet.
The results are from a folder with 1927 images all wallpaper sized and
double wallpaper sized. Image sizes ranges from 200KB up till a few
mega bytes per image. I scaled them all down to a max width or height
of 200 px by using this:

static GdkPixbuf *
scale_pixbuf_preserve_aspect_ratio (GdkPixbuf *pixbuf,
                                    gint size,
                                    GdkInterpType interp)
{
    GdkPixbuf *new_icon;
    gdouble w, h, new_width, new_height, max_edge;

    w = gdk_pixbuf_get_width (pixbuf);
    h = gdk_pixbuf_get_height (pixbuf);
    max_edge = MAX (w, h);
    new_width = size * (w / max_edge);
    new_height = size * (h / max_edge);

    /* Scale the image down, preserving the aspect ratio */
    new_icon = gdk_pixbuf_scale_simple (pixbuf, (gint)new_width,
(gint)new_height, interp);

    return new_icon;
}

That comes from the ubuntu wiki, just so you know before you attack me
on the code ^_^
All thumbnails where generated (in every single benchmark with every
image) on the bilinear flag for best quality.

Now for the results:

Glib
----------------------
1927 images thumbnailed in 2.29 minutes. That is roughly 0.07 seconds
per thumbnail

GraphicsMagick
----------------------
1927 images thumbnailed in 3.08 minutes. That is roughly 0.09 seconds
per thumbnail

FreeImage
----------------------
1927 images thumbnailed in 5.45 minutes. That is roughly 0.17 seconds
per thumbnail

Now first a sorry for the message i send to this list a few years ago
about nautilus being slow. Nautilus isn't slow as it uses glib and
glib is the clear winner here. Nautilus might have some strange things
in the code tht displays the thumbnails (didn't look) but the
generating if it is far from slow.

Now to speed it up even more.
I don't quite know how the bilinear algorithm works in glib. i've
looked at it but is kinda messy to just dive in but if it is something
that can be done in a multi threaded way then that would be the ideal
place to do it while all other programs using the code (like nautilus)
just get the performance boost without changing a single line. This
might be a little to hard to do or perhaps not even possible with the
current algorithm (thoughts on that please) so another way to speed it
up might be in the applications that use the code (again like
nautilus) you can make threads there where there is a list of known
files to thumbnail it can't be that hard to split that list in 4
(depending on your cores) and just let each core render a list of
thumbnails. I don't know the ins and outs of how this would be the
most logical way but i do know that on my pc just 1 of the 4 cores is
used which is a waste of those other 3 cores.

I personally haven't done any multi threading yet so don't count on
quality code if i do this ^_^

So, is there anyone here that has experienced with this? is there
anyone i an work with on this? actually anyone that could help me
through this? i'm brand new in glib and new to programming in general.
I would like to give this a try but i'm afraid i can't do this alone.
So any information about this, help and thoughts would be nice.

Mark.
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Christian Hergert-3
Have you profiled to see if the bottleneck is CPU bound?  If its IO
bound, you will only cause more contention by adding threading.

At minimum, using a thread (or async gio) to load files and another
thread that just thumbnails might be a good idea.

Cheers,

-- Christian

Mark wrote:
>   is there anyone here that has experienced with this? is there
> anyone i an work with on this? actually anyone that could help me
> through this? i'm brand new in glib and new to programming in general.
> I would like to give this a try but i'm afraid i can't do this alone.
> So any information about this, help and thoughts would be nice.
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Bugzilla from markg85@gmail.com
On Fri, Aug 28, 2009 at 10:45 PM, Christian Hergert<[hidden email]> wrote:

> Have you profiled to see if the bottleneck is CPU bound?  If its IO bound,
> you will only cause more contention by adding threading.
>
> At minimum, using a thread (or async gio) to load files and another thread
> that just thumbnails might be a good idea.
>
> Cheers,
>
> -- Christian
>
> Mark wrote:
>>
>>  is there anyone here that has experienced with this? is there
>> anyone i an work with on this? actually anyone that could help me
>> through this? i'm brand new in glib and new to programming in general.
>> I would like to give this a try but i'm afraid i can't do this alone.
>> So any information about this, help and thoughts would be nice.

Hi,

I haven't done io profiling but i did calculate the disc usage for
those 1927 files. and every benchmark was WAY below what my hdd could
handle (Spinpoint F1 1TB hdd and it can handle roughly 100MB/sec).

As for the CPU. i did "profiling" there. i just opened up gnome's
system monitor tool and just watch it during the benchmark. That gave
the result of 1 core running at 100% and 3 other cores idle thus there
is a cpu bottleneck with thumbnail generating.

About your suggestion for loading in a thread and thumbnailing in
another thread.. don't you get in big problems if the pc is
thumbnailing faster then it's loading? Or is that a impossible
scenario?

Mark.
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

David Zeuthen
On Fri, 2009-08-28 at 23:15 +0200, Mark wrote:
> I haven't done io profiling but i did calculate the disc usage for
> those 1927 files. and every benchmark was WAY below what my hdd could
> handle (Spinpoint F1 1TB hdd and it can handle roughly 100MB/sec).

Uhm, wait, that's only true for sequential reading. For random IO (which
is true for thumbnailing) on rotational media, the throughput is much
much lower (probably on the order 1MB/s) due to seek times. It's much
better on SSD but not everyone has a SSD.

You also want to remember to drop all caches before profiling this - the
relevant file here is /proc/sys/vm/drop_caches - see proc(5) for more
details.

It might be useful to sort the queue of files waiting to be thumbnailed
according to the block position of the inode (see the readahead sources
for how to do that). But since a file is not necessarily laid out in
order on the disk (ext4 helps a bit here though) [1], there might be a
lot of seeking involved anyway. So it might not help at all.

(And IIRC, Nautilus already reorders the thumbnail list to give priority
to files currently visible in the file manager. It would probably look
weird if this happened out of order - so the visibility sort would win
compared to the block position sort.)

     David

[1] : And this is most definitely not true for files downloaded via
Bittorrent - which is a very normal use case for video files (which we
thumbnail)

>
> As for the CPU. i did "profiling" there. i just opened up gnome's
> system monitor tool and just watch it during the benchmark. That gave
> the result of 1 core running at 100% and 3 other cores idle thus there
> is a cpu bottleneck with thumbnail generating.
>
> About your suggestion for loading in a thread and thumbnailing in
> another thread.. don't you get in big problems if the pc is
> thumbnailing faster then it's loading? Or is that a impossible
> scenario?
>
> Mark.
> _______________________________________________
> gtk-devel-list mailing list
> [hidden email]
> http://mail.gnome.org/mailman/listinfo/gtk-devel-list


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Christian Hergert-3
In reply to this post by Bugzilla from markg85@gmail.com
Hi,

What you mentioned is good information to start hunting.  Was the CPU
time related to IO wait at all?  Always get accurate numbers before
performance tuning.  "Measure, measure, measure" or so the mantra goes.

Unfortunately, the symptom you see regarding IO will very likely change
under a different processing model.  If the problem is truly CPU bound
then you will only be starting IO requests after you were done
processing.  This means valuable time is wasted while waiting for the
pages to be loaded into the buffers.  The code will just be blocking
while this is going on.

What could be done easily is every time an item starts processing it
could asynchronously begin loading the next image using gio.  This means
the kernel can start paging that file into the vfs cache while you are
processing the image.  This of course would still mean you are limited
to a single processor doing the scaling.  But if the problem is in fact
cpu bound, that next image will almost always be loaded by time you
finish the scale meaning you've maximized the processing potential per core.

To support multi-core, like it sounds like you want, a queue could be
used to store the upcoming work items.  A worker per core, for example,
can get their next file from that queue.  FWIW, I wrote a library,
iris[1], built specifically for doing work like this while efficiently
using threads with minimum lock-contention.  It would allow for scaling
up threads to the number of cores and back down when they are no longer
needed.

Cheers,

[1] http://git.dronelabs.com/iris

-- Christian

Mark wrote:

> On Fri, Aug 28, 2009 at 10:45 PM, Christian Hergert<[hidden email]>  wrote:
>> Have you profiled to see if the bottleneck is CPU bound?  If its IO bound,
>> you will only cause more contention by adding threading.
>>
>> At minimum, using a thread (or async gio) to load files and another thread
>> that just thumbnails might be a good idea.
>>
>> Cheers,
>>
>> -- Christian
>>
>> Mark wrote:
>>>   is there anyone here that has experienced with this? is there
>>> anyone i an work with on this? actually anyone that could help me
>>> through this? i'm brand new in glib and new to programming in general.
>>> I would like to give this a try but i'm afraid i can't do this alone.
>>> So any information about this, help and thoughts would be nice.
>
> Hi,
>
> I haven't done io profiling but i did calculate the disc usage for
> those 1927 files. and every benchmark was WAY below what my hdd could
> handle (Spinpoint F1 1TB hdd and it can handle roughly 100MB/sec).
>
> As for the CPU. i did "profiling" there. i just opened up gnome's
> system monitor tool and just watch it during the benchmark. That gave
> the result of 1 core running at 100% and 3 other cores idle thus there
> is a cpu bottleneck with thumbnail generating.
>
> About your suggestion for loading in a thread and thumbnailing in
> another thread.. don't you get in big problems if the pc is
> thumbnailing faster then it's loading? Or is that a impossible
> scenario?
>
> Mark.
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Bugzilla from markg85@gmail.com
In reply to this post by David Zeuthen
On Fri, Aug 28, 2009 at 11:38 PM, David Zeuthen<[hidden email]> wrote:
> On Fri, 2009-08-28 at 23:15 +0200, Mark wrote:
>> I haven't done io profiling but i did calculate the disc usage for
>> those 1927 files. and every benchmark was WAY below what my hdd could
>> handle (Spinpoint F1 1TB hdd and it can handle roughly 100MB/sec).
>
> Uhm, wait, that's only true for sequential reading. For random IO (which
> is true for thumbnailing) on rotational media, the throughput is much
> much lower (probably on the order 1MB/s) due to seek times. It's much
> better on SSD but not everyone has a SSD.

I know. but thanx for pointing it out.

>
> You also want to remember to drop all caches before profiling this - the
> relevant file here is /proc/sys/vm/drop_caches - see proc(5) for more
> details.
>
> It might be useful to sort the queue of files waiting to be thumbnailed
> according to the block position of the inode (see the readahead sources
> for how to do that). But since a file is not necessarily laid out in
> order on the disk (ext4 helps a bit here though) [1], there might be a
> lot of seeking involved anyway. So it might not help at all.
>
> (And IIRC, Nautilus already reorders the thumbnail list to give priority
> to files currently visible in the file manager. It would probably look
> weird if this happened out of order - so the visibility sort would win
> compared to the block position sort.)
>
>     David
>
> [1] : And this is most definitely not true for files downloaded via
> Bittorrent - which is a very normal use case for video files (which we
> thumbnail)
>
>>
>> As for the CPU. i did "profiling" there. i just opened up gnome's
>> system monitor tool and just watch it during the benchmark. That gave
>> the result of 1 core running at 100% and 3 other cores idle thus there
>> is a cpu bottleneck with thumbnail generating.
>>
>> About your suggestion for loading in a thread and thumbnailing in
>> another thread.. don't you get in big problems if the pc is
>> thumbnailing faster then it's loading? Or is that a impossible
>> scenario?
>>
>> Mark.
>> _______________________________________________
>> gtk-devel-list mailing list
>> [hidden email]
>> http://mail.gnome.org/mailman/listinfo/gtk-devel-list
>
>
>


On Fri, Aug 28, 2009 at 11:49 PM, Christian Hergert<[hidden email]> wrote:
> Hi,
>
> What you mentioned is good information to start hunting.  Was the CPU time
> related to IO wait at all?  Always get accurate numbers before performance
> tuning.  "Measure, measure, measure" or so the mantra goes.

Perhaps a stupid question but what is a good way of profiling io? cpu
is easy but i've never done io.
In this case my hdd is certainly able to do more then 10 thumbnails
per second however i could see a potential issue when someone with a
slower hdd and a faster cpu then mine is thumbnailing a lot of images.
There the hdd will likely be the bottleneck.
>
> Unfortunately, the symptom you see regarding IO will very likely change
> under a different processing model.  If the problem is truly CPU bound then
> you will only be starting IO requests after you were done processing.  This
> means valuable time is wasted while waiting for the pages to be loaded into
> the buffers.  The code will just be blocking while this is going on.

And how can i test that?
>
> What could be done easily is every time an item starts processing it could
> asynchronously begin loading the next image using gio.  This means the
> kernel can start paging that file into the vfs cache while you are
> processing the image.  This of course would still mean you are limited to a
> single processor doing the scaling.  But if the problem is in fact cpu
> bound, that next image will almost always be loaded by time you finish the
> scale meaning you've maximized the processing potential per core.

That sounds like a nice way to optimize it for one core. But could
there be any optimization possible in my case? since i have 100% cpu
usage for one core with just the benchmark.
>
> To support multi-core, like it sounds like you want, a queue could be used
> to store the upcoming work items.  A worker per core, for example, can get
> their next file from that queue.  FWIW, I wrote a library, iris[1], built
> specifically for doing work like this while efficiently using threads with
> minimum lock-contention.  It would allow for scaling up threads to the
> number of cores and back down when they are no longer needed.
>
That sounds very interesting.
Just one question about the queue. Would it be better to thread the
application (nautilus) or the library (glib)? If your answer is the
library then the queue has to be passed from nautilus to glib. I would
say glib because all application have benefit from it without
adjusting there code.

> Cheers,
>
> [1] http://git.dronelabs.com/iris
>
> -- Christian
>
> Mark wrote:
>>
>> On Fri, Aug 28, 2009 at 10:45 PM, Christian Hergert<[hidden email]>
>>  wrote:
>>>
>>> Have you profiled to see if the bottleneck is CPU bound?  If its IO
>>> bound,
>>> you will only cause more contention by adding threading.
>>>
>>> At minimum, using a thread (or async gio) to load files and another
>>> thread
>>> that just thumbnails might be a good idea.
>>>
>>> Cheers,
>>>
>>> -- Christian
>>>
>>> Mark wrote:
>>>>
>>>>  is there anyone here that has experienced with this? is there
>>>> anyone i an work with on this? actually anyone that could help me
>>>> through this? i'm brand new in glib and new to programming in general.
>>>> I would like to give this a try but i'm afraid i can't do this alone.
>>>> So any information about this, help and thoughts would be nice.
>>
>> Hi,
>>
>> I haven't done io profiling but i did calculate the disc usage for
>> those 1927 files. and every benchmark was WAY below what my hdd could
>> handle (Spinpoint F1 1TB hdd and it can handle roughly 100MB/sec).
>>
>> As for the CPU. i did "profiling" there. i just opened up gnome's
>> system monitor tool and just watch it during the benchmark. That gave
>> the result of 1 core running at 100% and 3 other cores idle thus there
>> is a cpu bottleneck with thumbnail generating.
>>
>> About your suggestion for loading in a thread and thumbnailing in
>> another thread.. don't you get in big problems if the pc is
>> thumbnailing faster then it's loading? Or is that a impossible
>> scenario?
>>
>> Mark.
>
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Christian Hergert-3

> On Fri, Aug 28, 2009 at 11:49 PM, Christian Hergert<[hidden email]>  wrote:
>> Hi,
>>
>> What you mentioned is good information to start hunting.  Was the CPU time
>> related to IO wait at all?  Always get accurate numbers before performance
>> tuning.  "Measure, measure, measure" or so the mantra goes.
>
> Perhaps a stupid question but what is a good way of profiling io? cpu
> is easy but i've never done io.
> In this case my hdd is certainly able to do more then 10 thumbnails
> per second however i could see a potential issue when someone with a
> slower hdd and a faster cpu then mine is thumbnailing a lot of images.
> There the hdd will likely be the bottleneck.

You can do something really crude by reading from /proc/pid/* (man proc
for more info).  Or you could try using some tools like sysstat,
oprofile, system-tap, etc.  We really need a generic profiling tool that
can do all of this stuff from a single interface.  However, at the
current time, I've been most successful with just writing one off
graphing for the specific problem.  For example, put in some g_print()
lines and grep for those and then graph them using your favorite plotter
or cairo goodness.

>> Unfortunately, the symptom you see regarding IO will very likely change
>> under a different processing model.  If the problem is truly CPU bound then
>> you will only be starting IO requests after you were done processing.  This
>> means valuable time is wasted while waiting for the pages to be loaded into
>> the buffers.  The code will just be blocking while this is going on.
>
> And how can i test that?

ltrace works for simple non-threaded applications.  Basically you should
see in the profiling timings that one work item happens sequentially
after the previous such as (load, process, load, process, ...)

I would hate to provide conjecture about the proper design until we have
more measurements.  It is a good idea to optimize the single threaded
approach before the multi-core approach since it would have to be done
anyway and is likely less complex of a problem before the additional
threads.

>> What could be done easily is every time an item starts processing it could
>> asynchronously begin loading the next image using gio.  This means the
>> kernel can start paging that file into the vfs cache while you are
>> processing the image.  This of course would still mean you are limited to a
>> single processor doing the scaling.  But if the problem is in fact cpu
>> bound, that next image will almost always be loaded by time you finish the
>> scale meaning you've maximized the processing potential per core.
>
> That sounds like a nice way to optimize it for one core. But could
> there be any optimization possible in my case? since i have 100% cpu
> usage for one core with just the benchmark.

You can't properly optimize for the multi-core scenario until the
single-core scenario is fixed.

>> To support multi-core, like it sounds like you want, a queue could be used
>> to store the upcoming work items.  A worker per core, for example, can get
>> their next file from that queue.  FWIW, I wrote a library, iris[1], built
>> specifically for doing work like this while efficiently using threads with
>> minimum lock-contention.  It would allow for scaling up threads to the
>> number of cores and back down when they are no longer needed.
>>
> That sounds very interesting.
> Just one question about the queue. Would it be better to thread the
> application (nautilus) or the library (glib)? If your answer is the
> library then the queue has to be passed from nautilus to glib. I would
> say glib because all application have benefit from it without
> adjusting there code.

I haven't looked at this code in detail yet, so I cannot confirm or
deny.  My initial assumption would be that the thumb-nailing API (again,
I have no experience with it yet) should be restructured around an
asynchronous design (begin/end methods) and the synchronous
implementation built around that.  And of course, nobody should use the
synchronous version unless they *really* have a reason to.

FWIW, I would be willing to help hack on this, but I'm swamped for at
least the next few weeks.

-- Christian
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Bugzilla from markg85@gmail.com
On Sat, Aug 29, 2009 at 1:04 AM, Christian Hergert<[hidden email]> wrote:

>
>> On Fri, Aug 28, 2009 at 11:49 PM, Christian Hergert<[hidden email]>
>>  wrote:
>>>
>>> Hi,
>>>
>>> What you mentioned is good information to start hunting.  Was the CPU
>>> time
>>> related to IO wait at all?  Always get accurate numbers before
>>> performance
>>> tuning.  "Measure, measure, measure" or so the mantra goes.
>>
>> Perhaps a stupid question but what is a good way of profiling io? cpu
>> is easy but i've never done io.
>> In this case my hdd is certainly able to do more then 10 thumbnails
>> per second however i could see a potential issue when someone with a
>> slower hdd and a faster cpu then mine is thumbnailing a lot of images.
>> There the hdd will likely be the bottleneck.
>
> You can do something really crude by reading from /proc/pid/* (man proc for
> more info).  Or you could try using some tools like sysstat, oprofile,
> system-tap, etc.  We really need a generic profiling tool that can do all of
> this stuff from a single interface.  However, at the current time, I've been
> most successful with just writing one off graphing for the specific problem.
>  For example, put in some g_print() lines and grep for those and then graph
> them using your favorite plotter or cairo goodness.
>
>>> Unfortunately, the symptom you see regarding IO will very likely change
>>> under a different processing model.  If the problem is truly CPU bound
>>> then
>>> you will only be starting IO requests after you were done processing.
>>>  This
>>> means valuable time is wasted while waiting for the pages to be loaded
>>> into
>>> the buffers.  The code will just be blocking while this is going on.
>>
>> And how can i test that?
>
> ltrace works for simple non-threaded applications.  Basically you should see
> in the profiling timings that one work item happens sequentially after the
> previous such as (load, process, load, process, ...)
>
> I would hate to provide conjecture about the proper design until we have
> more measurements.  It is a good idea to optimize the single threaded
> approach before the multi-core approach since it would have to be done
> anyway and is likely less complex of a problem before the additional
> threads.
>
>>> What could be done easily is every time an item starts processing it
>>> could
>>> asynchronously begin loading the next image using gio.  This means the
>>> kernel can start paging that file into the vfs cache while you are
>>> processing the image.  This of course would still mean you are limited to
>>> a
>>> single processor doing the scaling.  But if the problem is in fact cpu
>>> bound, that next image will almost always be loaded by time you finish
>>> the
>>> scale meaning you've maximized the processing potential per core.
>>
>> That sounds like a nice way to optimize it for one core. But could
>> there be any optimization possible in my case? since i have 100% cpu
>> usage for one core with just the benchmark.
>
> You can't properly optimize for the multi-core scenario until the
> single-core scenario is fixed.
>
>>> To support multi-core, like it sounds like you want, a queue could be
>>> used
>>> to store the upcoming work items.  A worker per core, for example, can
>>> get
>>> their next file from that queue.  FWIW, I wrote a library, iris[1], built
>>> specifically for doing work like this while efficiently using threads
>>> with
>>> minimum lock-contention.  It would allow for scaling up threads to the
>>> number of cores and back down when they are no longer needed.
>>>
>> That sounds very interesting.
>> Just one question about the queue. Would it be better to thread the
>> application (nautilus) or the library (glib)? If your answer is the
>> library then the queue has to be passed from nautilus to glib. I would
>> say glib because all application have benefit from it without
>> adjusting there code.
>
> I haven't looked at this code in detail yet, so I cannot confirm or deny.
>  My initial assumption would be that the thumb-nailing API (again, I have no
> experience with it yet) should be restructured around an asynchronous design
> (begin/end methods) and the synchronous implementation built around that.
>  And of course, nobody should use the synchronous version unless they
> *really* have a reason to.
>
> FWIW, I would be willing to help hack on this, but I'm swamped for at least
> the next few weeks.
>
> -- Christian
>

I guess the next thing for me would be to get more accurate benchmarks.
I right now have the benchmarks in timings (so, how long does making a
pixbuf from an image take, how long to do the scaling (surprisingly
short!) and how long to save it) but i guess i need to expand that a
bit with io timings as well. I will just give it a try.
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Bugzilla from markg85@gmail.com
In reply to this post by Christian Hergert-3
On Sat, Aug 29, 2009 at 1:04 AM, Christian Hergert<[hidden email]> wrote:

>
>> On Fri, Aug 28, 2009 at 11:49 PM, Christian Hergert<[hidden email]>
>>  wrote:
>>>
>>> Hi,
>>>
>>> What you mentioned is good information to start hunting.  Was the CPU
>>> time
>>> related to IO wait at all?  Always get accurate numbers before
>>> performance
>>> tuning.  "Measure, measure, measure" or so the mantra goes.
>>
>> Perhaps a stupid question but what is a good way of profiling io? cpu
>> is easy but i've never done io.
>> In this case my hdd is certainly able to do more then 10 thumbnails
>> per second however i could see a potential issue when someone with a
>> slower hdd and a faster cpu then mine is thumbnailing a lot of images.
>> There the hdd will likely be the bottleneck.
>
> You can do something really crude by reading from /proc/pid/* (man proc for
> more info).  Or you could try using some tools like sysstat, oprofile,
> system-tap, etc.  We really need a generic profiling tool that can do all of
> this stuff from a single interface.  However, at the current time, I've been
> most successful with just writing one off graphing for the specific problem.
>  For example, put in some g_print() lines and grep for those and then graph
> them using your favorite plotter or cairo goodness.
>
>>> Unfortunately, the symptom you see regarding IO will very likely change
>>> under a different processing model.  If the problem is truly CPU bound
>>> then
>>> you will only be starting IO requests after you were done processing.
>>>  This
>>> means valuable time is wasted while waiting for the pages to be loaded
>>> into
>>> the buffers.  The code will just be blocking while this is going on.
>>
>> And how can i test that?
>
> ltrace works for simple non-threaded applications.  Basically you should see
> in the profiling timings that one work item happens sequentially after the
> previous such as (load, process, load, process, ...)
>
> I would hate to provide conjecture about the proper design until we have
> more measurements.  It is a good idea to optimize the single threaded
> approach before the multi-core approach since it would have to be done
> anyway and is likely less complex of a problem before the additional
> threads.
>
>>> What could be done easily is every time an item starts processing it
>>> could
>>> asynchronously begin loading the next image using gio.  This means the
>>> kernel can start paging that file into the vfs cache while you are
>>> processing the image.  This of course would still mean you are limited to
>>> a
>>> single processor doing the scaling.  But if the problem is in fact cpu
>>> bound, that next image will almost always be loaded by time you finish
>>> the
>>> scale meaning you've maximized the processing potential per core.
>>
>> That sounds like a nice way to optimize it for one core. But could
>> there be any optimization possible in my case? since i have 100% cpu
>> usage for one core with just the benchmark.
>
> You can't properly optimize for the multi-core scenario until the
> single-core scenario is fixed.
>
>>> To support multi-core, like it sounds like you want, a queue could be
>>> used
>>> to store the upcoming work items.  A worker per core, for example, can
>>> get
>>> their next file from that queue.  FWIW, I wrote a library, iris[1], built
>>> specifically for doing work like this while efficiently using threads
>>> with
>>> minimum lock-contention.  It would allow for scaling up threads to the
>>> number of cores and back down when they are no longer needed.
>>>
>> That sounds very interesting.
>> Just one question about the queue. Would it be better to thread the
>> application (nautilus) or the library (glib)? If your answer is the
>> library then the queue has to be passed from nautilus to glib. I would
>> say glib because all application have benefit from it without
>> adjusting there code.
>
> I haven't looked at this code in detail yet, so I cannot confirm or deny.
>  My initial assumption would be that the thumb-nailing API (again, I have no
> experience with it yet) should be restructured around an asynchronous design
> (begin/end methods) and the synchronous implementation built around that.
>  And of course, nobody should use the synchronous version unless they
> *really* have a reason to.
>
> FWIW, I would be willing to help hack on this, but I'm swamped for at least
> the next few weeks.
>
> -- Christian
>

Oke, i did some ltracing now. Here is the ltrace output for 5 images
that get thumbnailed. I hope anyone here could help me a bit in
explaining this output since it's not all obvious to me.
Before you read the output. i put a 2 second sleep after a image is
done and the pixbuffs are removed from memory. This makes it a lot
more easy to identify where the image loading/scaling/saving is
happening.
The output is here: http://codepad.org/ydtCcJ5d

Now for the splitting up that i did to sort out io and cpu.
This is all in the 1251553123 timeframe so it's just one image.

1251553123.625556
_ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E(0x613de0,
0x701ac0, 16, 15416, 0x7ffffa4b2a90) = 0x613de0
1251553123.625635 _ZNSsC1Ev(0x7ffffa4b2f90, 0x6ed100, 8, 8,
0x5f7974696e616d75) = 0x7fcaa64de158
1251553123.625683
_ZNSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strERKSs(0x7ffffa4b2de0,
0x7ffffa4b2f90, 0x7ffffa4b2f90, 8, 0x5f7974696e616d75) = 0x781578
1251553123.625733 _ZNSsD1Ev(0x7ffffa4b2f90, 0x781578, 0x781578, 0,
0x781778) = 0x7fcaa64de158
1251553123.625781
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc(0x7ffffa4b2df0,
0x40c950, 16, 0, 0x781778) = 0x7ffffa4b2df0
1251553123.625830
_ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E(0x7ffffa4b2df0,
0x701ac0, 88, 88, 0x2f6e6f6974616d69) = 0x7ffffa4b2df0
1251553123.625878
_ZNKSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strEv(0x7ffffa4b2fa0,
0x7ffffa4b2de0, 0x7ffffa4b2de0, 88, 0x5f7974696e616d75) =
0x7ffffa4b2fa0
1251553123.625929 _ZNSsaSERKSs(0x7ffffa4b2f60, 0x7ffffa4b2fa0,
0x7ffffa4b2fa0, 0x7fcaa64de140, 0x69525f7974696e61) = 0x7ffffa4b2f60
1251553123.625977 _ZNSsD1Ev(0x7ffffa4b2fa0, 0, 0x786c50, 0x5cf210,
0x69525f7974696e61) = 0x782018

I have no clue what is going on in those _Z* functions.. what are those?

1251553123.626120 gdk_pixbuf_new_from_file(0x782018, 0x7ffffa4b2f88,
0x7ffffa4b2f88, 0x4a992f63, 0x69525f7974696e61 <unfinished ...>
1251553123.626161
SYS_open("/home/mark/ThumbnailingBenchmarks/2000_Wallpapers-Reanimation/Sea_of_Humanity_Rio_de_Janeiro_Brazil."...,
0, 0666) = 4
1251553123.626216 SYS_fstat(4, 0x7ffffa4b1b20)                             = 0
1251553123.626238 SYS_mmap(0, 4096, 3, 34, 0xffffffff)
    = 0x7fcaa928a000
1251553123.626273 SYS_read(4, "\377\330\377\340", 4096)
    = 4096
1251553123.626355 SYS_lseek(4, 0, 0)                                       = 0
1251553123.626379 SYS_rt_sigprocmask(0, 0, 0x7ffffa4a1af0, 8, 0)           = 0
1251553123.626417 SYS_read(4, "\377\330\377\340", 65536)
    = 65536
1251553123.636544 SYS_read(4,
"Ob\316Z\330\271\003\030\007\004TJ\t-\312N\347f>i\001\344\236\347\332\271/\023\370b\373Z\324\214\366\332\2040\256\320\241Y7c\025\240\276"\201\320\223\003\257\004",
65536) = 65536
1251553123.643681 SYS_read(4,
"\221\301\256\332)u9gwt\212_\331r\\[[\264eS1\202\304\234\222ME\007\207\334\027\3350\371rO<Tz\307\332-\322\3365\221\227l8
\022\006k*\326[\247.\032W#\034\214\340\032\350\216\333\034\365%\016v\254n\256\213\003\002~\320\243\007\200Mmj7\247G\360\235\265\265\276\331U\345b\\"...,
65536) = 65536
1251553123.650134 SYS_read(4,
"\3749w\241\276\254/\2368\334\022eq\202\207\270\375j\343Bo\2315\251\234\261t\242\242\357\243\330\304\360\307\207\216\257i`\333\205\241\215\216\340\374\345Go\326\275.\337\341\235\265\314\022\006\324\345X\336M\376@\031A\\\356\203\033\266\224#\323\360#\212\365\241R\314\001\233\216\325\351\220=\325\207\207#i%\020\334L"...,
65536) = 65536
1251553123.656189 SYS_read(4,
"\357\255\254\214w\321\233\231Y\367o\017\234\n\332p@\004>7s\323\004R\205<",
65536) = 65536
1251553123.662128 SYS_read(4,
"\272\206\250\222i\236'm\303\316{\333wU'\223\217\363\372S\224\264\320\342\204/d\374\277\366\323\242\370e\243\330H\251w\250Zl\235'\016\222\311\300UA\226\030\374\253WA:w\210<W\255\335\315\022\\K\347f\022\343;Tq\376\025\3473k\232\234\227q\013\206o(n\350pM%\216\275\253\370f\376\333R"...,
65536) = 65536
1251553123.668090 SYS_read(4, "\016p", 65536)
    = 65536
1251553123.673774 SYS_read(4, "s\00426A\367\377", 65536)
    = 23360
1251553123.673819 SYS_read(4, "", 40960)                                   = 0
1251553123.676090 SYS_close(4)                                             = 0
1251553123.676119 SYS_munmap(0x7fcaa928a000, 4096)                         = 0
1251553123.676199 <... gdk_pixbuf_new_from_file resumed> )
    = 0x7818f0

Here is a obvious file read going on but what i don't understand is
why are there multiple SYS_read commands? I thought it was just
reading the contents of one file in one read command..

1251553123.676269 printf("\n - %f", ... <unfinished ...>
1251553123.676301 SYS_write(1,
"Sea_of_Humanity_Rio_de_Janeiro_Brazil.jpg\n", 42) = 42
1251553123.676364 <... printf resumed> )

So, printf is writing to a file as well? what's happening here?
Then right after 1251553123.676364 there are 9 more _Z* functions..
what do they do?

1251553123.676818 gdk_pixbuf_get_width(0x7818f0, 200, 2, 200,
0x7974696e616d7548) = 1600
1251553123.676866 gdk_pixbuf_get_height(0x7818f0, 200, 0x75a310, 200,
0x7974696e616d7548) = 1200
1251553123.676915 gdk_pixbuf_scale_simple(0x7818f0, 200, 150, 2,
0x7974696e616d7548) = 0x781940
1251553123.682973

Actual image downscaling to a thumbnail is here. obviously cpu stuff
and quite fast.

1251553123.683186 gdk_pixbuf_save(0x781940, 0x782d18, 0x40c9f7,
0x7ffffa4b2f88, 0 <unfinished ...>
....
range of memcopy functions
1251553123.688447 memcpy(0x0079cc02,
"D\274Y\257\255\313u\0356\306\234U\337\267\326n\3169\373\364\315mx\311\313{\331\211b\003Q\246%'\024lZ\nd\305\266\002\004Nl
H\203", 8190) = 0x0079cc02
1251553123.688507 SYS_write(4, "\211PNG\r\n\032\n", 4096)
    = 4096
1251553123.688547 SYS_write(4,
"\234\335d\304\332D\262)\213`i\222\212a\273", 4096) = 4096
....
and that roughly 4 more times. I guess a total of ~100 memcpy's

I have a few questions here
- Why are there so many memcopy functions? and could it be possible
that it's a memcopy for every horizontal pixel line (since that would
be roughly equal to the amount of memcpy commands)
- Why is it doing memcpy (~20 times) the writing to a file then memcpy
again then writing again till the image is done.. is that logical? I
might miss the point but i don't see the logic in that. It seems to
write chuncks os data instead of writing all in one go.

1251553123.706736 SYS_write(4,
"\245\335\375*\300P\024\2067wQI>\250.\036t;"0,\014\006G\243\370\231\363\307'O>\363\351\373\357;=\256\\\255\356\bDP\004\203Q\230\240\210X"j\001\0042,\233\303\006\001S\024\005W<\256\313r\334\311d\377\367\022m\243\235\241\310\251\013_\335*T\364\203\365\007\237/\375\366Z&\031\333z"...,
3091) = 3091
1251553123.706790 SYS_close(4)                                             = 0
1251553123.707040 SYS_munmap(0x7fcaa928a000, 4096)                         = 0
1251553123.707090 <... gdk_pixbuf_save resumed> )                          = 1

The image is saved to the drive here.
After that some more _Z* functions and some gdk unref function for
memory that was reserved. not much strange there.

So, if i'm correct (please say so if i'm not) every single *_read*
function is a io call. same with write and close. But how do i get the
numbers from this that you wanted, Cristian?  It is showing the read
-- cpu -- write part (read as in reading the origional image, cpu as
in downscaling it, write as in saving it)

Sorry for asking so much questions about this but it's my first time
with this level of profiling. and i really would like to learn and
just know this kind of stuff.

Thanx for your patience,
Mark.
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Bugzilla from markg85@gmail.com
> Oke, i did some ltracing now. Here is the ltrace output for 5 images
> that get thumbnailed. I hope anyone here could help me a bit in
> explaining this output since it's not all obvious to me.
> Before you read the output. i put a 2 second sleep after a image is
> done and the pixbuffs are removed from memory. This makes it a lot
> more easy to identify where the image loading/scaling/saving is
> happening.
> The output is here: http://codepad.org/ydtCcJ5d
>
> Now for the splitting up that i did to sort out io and cpu.
> This is all in the 1251553123 timeframe so it's just one image.
>
> 1251553123.625556
> _ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E(0x613de0,
> 0x701ac0, 16, 15416, 0x7ffffa4b2a90) = 0x613de0
> 1251553123.625635 _ZNSsC1Ev(0x7ffffa4b2f90, 0x6ed100, 8, 8,
> 0x5f7974696e616d75) = 0x7fcaa64de158
> 1251553123.625683
> _ZNSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strERKSs(0x7ffffa4b2de0,
> 0x7ffffa4b2f90, 0x7ffffa4b2f90, 8, 0x5f7974696e616d75) = 0x781578
> 1251553123.625733 _ZNSsD1Ev(0x7ffffa4b2f90, 0x781578, 0x781578, 0,
> 0x781778) = 0x7fcaa64de158
> 1251553123.625781
> _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc(0x7ffffa4b2df0,
> 0x40c950, 16, 0, 0x781778) = 0x7ffffa4b2df0
> 1251553123.625830
> _ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E(0x7ffffa4b2df0,
> 0x701ac0, 88, 88, 0x2f6e6f6974616d69) = 0x7ffffa4b2df0
> 1251553123.625878
> _ZNKSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strEv(0x7ffffa4b2fa0,
> 0x7ffffa4b2de0, 0x7ffffa4b2de0, 88, 0x5f7974696e616d75) =
> 0x7ffffa4b2fa0
> 1251553123.625929 _ZNSsaSERKSs(0x7ffffa4b2f60, 0x7ffffa4b2fa0,
> 0x7ffffa4b2fa0, 0x7fcaa64de140, 0x69525f7974696e61) = 0x7ffffa4b2f60
> 1251553123.625977 _ZNSsD1Ev(0x7ffffa4b2fa0, 0, 0x786c50, 0x5cf210,
> 0x69525f7974696e61) = 0x782018
>
> I have no clue what is going on in those _Z* functions.. what are those?
>
> 1251553123.626120 gdk_pixbuf_new_from_file(0x782018, 0x7ffffa4b2f88,
> 0x7ffffa4b2f88, 0x4a992f63, 0x69525f7974696e61 <unfinished ...>
> 1251553123.626161
> SYS_open("/home/mark/ThumbnailingBenchmarks/2000_Wallpapers-Reanimation/Sea_of_Humanity_Rio_de_Janeiro_Brazil."...,
> 0, 0666) = 4
> 1251553123.626216 SYS_fstat(4, 0x7ffffa4b1b20)                             = 0
> 1251553123.626238 SYS_mmap(0, 4096, 3, 34, 0xffffffff)
>    = 0x7fcaa928a000
> 1251553123.626273 SYS_read(4, "\377\330\377\340", 4096)
>    = 4096
> 1251553123.626355 SYS_lseek(4, 0, 0)                                       = 0
> 1251553123.626379 SYS_rt_sigprocmask(0, 0, 0x7ffffa4a1af0, 8, 0)           = 0
> 1251553123.626417 SYS_read(4, "\377\330\377\340", 65536)
>    = 65536
> 1251553123.636544 SYS_read(4,
> "Ob\316Z\330\271\003\030\007\004TJ\t-\312N\347f>i\001\344\236\347\332\271/\023\370b\373Z\324\214\366\332\2040\256\320\241Y7c\025\240\276"\201\320\223\003\257\004",
> 65536) = 65536
> 1251553123.643681 SYS_read(4,
> "\221\301\256\332)u9gwt\212_\331r\\[[\264eS1\202\304\234\222ME\007\207\334\027\3350\371rO<Tz\307\332-\322\3365\221\227l8
> \022\006k*\326[\247.\032W#\034\214\340\032\350\216\333\034\365%\016v\254n\256\213\003\002~\320\243\007\200Mmj7\247G\360\235\265\265\276\331U\345b\\"...,
> 65536) = 65536
> 1251553123.650134 SYS_read(4,
> "\3749w\241\276\254/\2368\334\022eq\202\207\270\375j\343Bo\2315\251\234\261t\242\242\357\243\330\304\360\307\207\216\257i`\333\205\241\215\216\340\374\345Go\326\275.\337\341\235\265\314\022\006\324\345X\336M\376@\031A\\\356\203\033\266\224#\323\360#\212\365\241R\314\001\233\216\325\351\220=\325\207\207#i%\020\334L"...,
> 65536) = 65536
> 1251553123.656189 SYS_read(4,
> "\357\255\254\214w\321\233\231Y\367o\017\234\n\332p@\004>7s\323\004R\205<",
> 65536) = 65536
> 1251553123.662128 SYS_read(4,
> "\272\206\250\222i\236'm\303\316{\333wU'\223\217\363\372S\224\264\320\342\204/d\374\277\366\323\242\370e\243\330H\251w\250Zl\235'\016\222\311\300UA\226\030\374\253WA:w\210<W\255\335\315\022\\K\347f\022\343;Tq\376\025\3473k\232\234\227q\013\206o(n\350pM%\216\275\253\370f\376\333R"...,
> 65536) = 65536
> 1251553123.668090 SYS_read(4, "\016p", 65536)
>    = 65536
> 1251553123.673774 SYS_read(4, "s\00426A\367\377", 65536)
>    = 23360
> 1251553123.673819 SYS_read(4, "", 40960)                                   = 0
> 1251553123.676090 SYS_close(4)                                             = 0
> 1251553123.676119 SYS_munmap(0x7fcaa928a000, 4096)                         = 0
> 1251553123.676199 <... gdk_pixbuf_new_from_file resumed> )
>    = 0x7818f0
>
> Here is a obvious file read going on but what i don't understand is
> why are there multiple SYS_read commands? I thought it was just
> reading the contents of one file in one read command..
>
> 1251553123.676269 printf("\n - %f", ... <unfinished ...>
> 1251553123.676301 SYS_write(1,
> "Sea_of_Humanity_Rio_de_Janeiro_Brazil.jpg\n", 42) = 42
> 1251553123.676364 <... printf resumed> )
>
> So, printf is writing to a file as well? what's happening here?
> Then right after 1251553123.676364 there are 9 more _Z* functions..
> what do they do?
>
> 1251553123.676818 gdk_pixbuf_get_width(0x7818f0, 200, 2, 200,
> 0x7974696e616d7548) = 1600
> 1251553123.676866 gdk_pixbuf_get_height(0x7818f0, 200, 0x75a310, 200,
> 0x7974696e616d7548) = 1200
> 1251553123.676915 gdk_pixbuf_scale_simple(0x7818f0, 200, 150, 2,
> 0x7974696e616d7548) = 0x781940
> 1251553123.682973
>
> Actual image downscaling to a thumbnail is here. obviously cpu stuff
> and quite fast.
>
> 1251553123.683186 gdk_pixbuf_save(0x781940, 0x782d18, 0x40c9f7,
> 0x7ffffa4b2f88, 0 <unfinished ...>
> ....
> range of memcopy functions
> 1251553123.688447 memcpy(0x0079cc02,
> "D\274Y\257\255\313u\0356\306\234U\337\267\326n\3169\373\364\315mx\311\313{\331\211b\003Q\246%'\024lZ\nd\305\266\002\004Nl
> H\203", 8190) = 0x0079cc02
> 1251553123.688507 SYS_write(4, "\211PNG\r\n\032\n", 4096)
>    = 4096
> 1251553123.688547 SYS_write(4,
> "\234\335d\304\332D\262)\213`i\222\212a\273", 4096) = 4096
> ....
> and that roughly 4 more times. I guess a total of ~100 memcpy's
>
> I have a few questions here
> - Why are there so many memcopy functions? and could it be possible
> that it's a memcopy for every horizontal pixel line (since that would
> be roughly equal to the amount of memcpy commands)
> - Why is it doing memcpy (~20 times) the writing to a file then memcpy
> again then writing again till the image is done.. is that logical? I
> might miss the point but i don't see the logic in that. It seems to
> write chuncks os data instead of writing all in one go.
>
> 1251553123.706736 SYS_write(4,
> "\245\335\375*\300P\024\2067wQI>\250.\036t;"0,\014\006G\243\370\231\363\307'O>\363\351\373\357;=\256\\\255\356\bDP\004\203Q\230\240\210X"j\001\0042,\233\303\006\001S\024\005W<\256\313r\334\311d\377\367\022m\243\235\241\310\251\013_\335*T\364\203\365\007\237/\375\366Z&\031\333z"...,
> 3091) = 3091
> 1251553123.706790 SYS_close(4)                                             = 0
> 1251553123.707040 SYS_munmap(0x7fcaa928a000, 4096)                         = 0
> 1251553123.707090 <... gdk_pixbuf_save resumed> )                          = 1
>
> The image is saved to the drive here.
> After that some more _Z* functions and some gdk unref function for
> memory that was reserved. not much strange there.
>
> So, if i'm correct (please say so if i'm not) every single *_read*
> function is a io call. same with write and close. But how do i get the
> numbers from this that you wanted, Cristian?  It is showing the read
> -- cpu -- write part (read as in reading the origional image, cpu as
> in downscaling it, write as in saving it)
>
> Sorry for asking so much questions about this but it's my first time
> with this level of profiling. and i really would like to learn and
> just know this kind of stuff.
>
> Thanx for your patience,
> Mark.
>

One more minor thing i notice while doing a strace (doesn't show up in
an ltrace) is quite a few mime cache misses:
1251560811.396896 stat("/home/mark/.local/share/mime/mime.cache",
0x7fff59450a40) = -1 ENOENT (No such file or directory) <0.000009>
1251560811.396935 stat("/home/mark/.local/share/mime/globs",
0x7fff59450a40) = -1 ENOENT (No such file or directory) <0.000007>
1251560811.396968 stat("/home/mark/.local/share/mime/magic",
0x7fff59450a40) = -1 ENOENT (No such file or directory) <0.000006>
1251560811.397001 stat("/usr/share/mime/mime.cache",
{st_mode=S_IFREG|0644, st_size=102200, ...}) = 0 <0.000009>
1251560811.397049 stat("/usr/local/share/mime/mime.cache",
0x7fff59450a40) = -1 ENOENT (No such file or directory) <0.000007>
1251560811.397083 stat("/usr/local/share/mime/globs", 0x7fff59450a40)
= -1 ENOENT (No such file or directory) <0.000006>
1251560811.397114 stat("/usr/local/share/mime/magic", 0x7fff59450a40)
= -1 ENOENT (No such file or directory) <0.000007>
1251560811.397145 stat("/opt/kde/share/mime/mime.cache",
0x7fff59450a40) = -1 ENOENT (No such file or directory) <0.000008>
1251560811.397178 stat("/opt/kde/share/mime/globs", 0x7fff59450a40) =
-1 ENOENT (No such file or directory) <0.000006>
1251560811.397209 stat("/opt/kde/share/mime/magic", 0x7fff59450a40) =
-1 ENOENT (No such file or directory) <0.000006>

And that is for every single file!
Now why isn't glib doing this just once then store which mime's are
there and only open that next time. even better would be to open it
one time and not remove it while there is an action going on. On the
other hand.. it's (on my side) just a for loop and glib probably has
no way of knowing that (right?)

btw. i'm now just assuming glib does this but don't take my word for
it. i haven't looked further yet. might as well be gdk.
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

jcupitt
In reply to this post by Bugzilla from markg85@gmail.com
2009/8/28 Mark <[hidden email]>:
> static GdkPixbuf *
> scale_pixbuf_preserve_aspect_ratio (GdkPixbuf *pixbuf,
>                                    gint size,
>                                    GdkInterpType interp)

One more idea: this will be very slow for JPEGs (your use case, I think).

It will decode the whole file, then shrink. libjpeg supports
shrink-on-load where it only decompresses enough of the file to be
able to supply pixels at a certain size. In particular, libjpeg can do
a very quick load-at-1/8th-size read where it just decompresses enough
to be able to get the DC component of each 8x8 block. If you use
libjpeg like this you can expect around a 100x speedup of the
decompress step.

I have some code that does this lying around, I'll try to clean it up
and post it next week so you can test it. Or maybe glib does this
already? I know imagemagick uses this trick for it's thumbnailing (if
it didn't it'd be far slower than glib, heh).

John
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Matthias Clasen-2
On Sun, Aug 30, 2009 at 9:51 AM, <[hidden email]> wrote:

> 2009/8/28 Mark <[hidden email]>:
>> static GdkPixbuf *
>> scale_pixbuf_preserve_aspect_ratio (GdkPixbuf *pixbuf,
>>                                    gint size,
>>                                    GdkInterpType interp)
>
> One more idea: this will be very slow for JPEGs (your use case, I think).
>
> It will decode the whole file, then shrink. libjpeg supports
> shrink-on-load where it only decompresses enough of the file to be
> able to supply pixels at a certain size. In particular, libjpeg can do
> a very quick load-at-1/8th-size read where it just decompresses enough
> to be able to get the DC component of each 8x8 block. If you use
> libjpeg like this you can expect around a 100x speedup of the
> decompress step.
>
> I have some code that does this lying around, I'll try to clean it up
> and post it next week so you can test it. Or maybe glib does this
> already? I know imagemagick uses this trick for it's thumbnailing (if
> it didn't it'd be far slower than glib, heh).

glib has nothing to do with it. gdk-pixbuf supports loading jpegs at
reduced size already.
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Bugzilla from markg85@gmail.com
In reply to this post by jcupitt
On Sun, Aug 30, 2009 at 3:51 PM, <[hidden email]> wrote:

> 2009/8/28 Mark <[hidden email]>:
>> static GdkPixbuf *
>> scale_pixbuf_preserve_aspect_ratio (GdkPixbuf *pixbuf,
>>                                    gint size,
>>                                    GdkInterpType interp)
>
> One more idea: this will be very slow for JPEGs (your use case, I think).
>
> It will decode the whole file, then shrink. libjpeg supports
> shrink-on-load where it only decompresses enough of the file to be
> able to supply pixels at a certain size. In particular, libjpeg can do
> a very quick load-at-1/8th-size read where it just decompresses enough
> to be able to get the DC component of each 8x8 block. If you use
> libjpeg like this you can expect around a 100x speedup of the
> decompress step.
>
> I have some code that does this lying around, I'll try to clean it up
> and post it next week so you can test it. Or maybe glib does this
> already? I know imagemagick uses this trick for it's thumbnailing (if
> it didn't it'd be far slower than glib, heh).
>
> John
>

Feel free to supply the code needed for that.
However in my benchmarks GraphicsMagick was a clear loser compared to
glib and to my understanding graphics magick is several times
__faster__ then imagemagick. I could be wrong.

And anyone that could try to answer the questions i have in my
previous (3!) mails? and am i doing things right now or am i messing
things up big time?
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

muppet-6
In reply to this post by Bugzilla from markg85@gmail.com

On Aug 29, 2009, at 8:08 AM, Mark wrote:

> Oke, i did some ltracing now. Here is the ltrace output for 5 images
> that get thumbnailed. I hope anyone here could help me a bit in
> explaining this output since it's not all obvious to me.
> Before you read the output. i put a 2 second sleep after a image is
> done and the pixbuffs are removed from memory. This makes it a lot
> more easy to identify where the image loading/scaling/saving is
> happening.
> The output is here: http://codepad.org/ydtCcJ5d

I find rows of timestamps hard to read for a sense of how long each  
thing took, so i loaded this up in a home-grown trace visualizer i  
created a while back (inspired by Federico's time charts -- this is an  
interactive version).  It's missing the ability to zoom the timeline,  
so the left side is a bit squished, but you can get the idea.  Here's  
a screenshot of the first image's data:

http://asofyet.org/muppet/tmp/thumbnail-lprof-viz.png

You can see that the blue lines with gaps in between represent SYS_read
() (reading the input file), then another, smaller gap for the scaling  
(doing the work), followed by a chunk of orage for the memcpy()s and  
writes (outputting the thumnail).  As you'd expect, it takes longer to  
read the full file than to write the thumbnail.



> Now for the splitting up that i did to sort out io and cpu.
> This is all in the 1251553123 timeframe so it's just one image.

> 1251553123.625556
> _ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E
> (0x613de0,
> 0x701ac0, 16, 15416, 0x7ffffa4b2a90) = 0x613de0
> 1251553123.625635 _ZNSsC1Ev(0x7ffffa4b2f90, 0x6ed100, 8, 8,
> 0x5f7974696e616d75) = 0x7fcaa64de158
> 1251553123.625683
> _ZNSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strERKSs
> (0x7ffffa4b2de0,
> 0x7ffffa4b2f90, 0x7ffffa4b2f90, 8, 0x5f7974696e616d75) = 0x781578
> 1251553123.625733 _ZNSsD1Ev(0x7ffffa4b2f90, 0x781578, 0x781578, 0,
> 0x781778) = 0x7fcaa64de158
> 1251553123.625781
> _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
> (0x7ffffa4b2df0,
> 0x40c950, 16, 0, 0x781778) = 0x7ffffa4b2df0
> 1251553123.625830
> _ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E
> (0x7ffffa4b2df0,
> 0x701ac0, 88, 88, 0x2f6e6f6974616d69) = 0x7ffffa4b2df0
> 1251553123.625878
> _ZNKSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strEv
> (0x7ffffa4b2fa0,
> 0x7ffffa4b2de0, 0x7ffffa4b2de0, 88, 0x5f7974696e616d75) =
> 0x7ffffa4b2fa0
> 1251553123.625929 _ZNSsaSERKSs(0x7ffffa4b2f60, 0x7ffffa4b2fa0,
> 0x7ffffa4b2fa0, 0x7fcaa64de140, 0x69525f7974696e61) = 0x7ffffa4b2f60
> 1251553123.625977 _ZNSsD1Ev(0x7ffffa4b2fa0, 0, 0x786c50, 0x5cf210,
> 0x69525f7974696e61) = 0x782018
>
> I have no clue what is going on in those _Z* functions.. what are  
> those?

Those are the real symbol names of C++ functions and methods.  The C++  
compiler encodes the return type and parameter types into the real  
symbol names, as described by the C++ ABI.  If you pass the -C or --
demangle option to ltrace, he'll make those into something readable.  
Anyway, above you can see stuff like basic_ostream and  
basic_stringstream char_traits, so this is some app code doing C++  
standard library I/O.



> 1251553123.626120 gdk_pixbuf_new_from_file(0x782018, 0x7ffffa4b2f88,
> 0x7ffffa4b2f88, 0x4a992f63, 0x69525f7974696e61 <unfinished ...>
> 1251553123.626161
> SYS_open("/home/mark/ThumbnailingBenchmarks/2000_Wallpapers-
> Reanimation/Sea_of_Humanity_Rio_de_Janeiro_Brazil."...,
> 0, 0666) = 4
> 1251553123.626216 SYS_fstat(4,  
> 0x7ffffa4b1b20)                             = 0
> 1251553123.626238 SYS_mmap(0, 4096, 3, 34, 0xffffffff)
>    = 0x7fcaa928a000
> 1251553123.626273 SYS_read(4, "\377\330\377\340", 4096)
>    = 4096
> 1251553123.626355 SYS_lseek(4, 0,  
> 0)                                       = 0
> 1251553123.626379 SYS_rt_sigprocmask(0, 0, 0x7ffffa4a1af0, 8,  
> 0)           = 0
> 1251553123.626417 SYS_read(4, "\377\330\377\340", 65536)
>    = 65536
> 1251553123.636544 SYS_read(4,
> "Ob\316Z\330\271\003\030\007\004TJ\t-\312N\347f>i
> \001\344\236\347\332\271/\023\370b\373Z
> \324\214\366\332\2040\256\320\241Y7c
> \025\240\276"\201\320\223\003\257\004",
> 65536) = 65536
> 1251553123.643681 SYS_read(4,
> "\221\301\256\332)u9gwt\212_\331r\\[[\264eS1\202\304\234\222ME
> \007\207\334\027\3350\371rO<Tz\307\332-\322\3365\221\227l8
> \022\006k*\326[\247.\032W#\034\214\340\032\350\216\333\034\365%\016v
> \254n\256\213\003\002~\320\243\007\200Mmj7\247G
> \360\235\265\265\276\331U\345b\\"...,
> 65536) = 65536
> 1251553123.650134 SYS_read(4,
> "\3749w\241\276\254/\2368\334\022eq\202\207\270\375j\343Bo
> \2315\251\234\261t\242\242\357\243\330\304\360\307\207\216\257i`
> \333\205\241\215\216\340\374\345Go
> \326\275.\337\341\235\265\314\022\006\324\345X\336M\376@\031A\\
> \356\203\033\266\224#\323\360#\212\365\241R
> \314\001\233\216\325\351\220=\325\207\207#i%\020\334L"...,
> 65536) = 65536
> 1251553123.656189 SYS_read(4,
> "\357\255\254\214w\321\233\231Y\367o\017\234\n\332p@\004>7s\323\004R
> \205<",
> 65536) = 65536
> 1251553123.662128 SYS_read(4,
> "\272\206\250\222i\236'm\303\316{\333wU'\223\217\363\372S
> \224\264\320\342\204/d\374\277\366\323\242\370e\243\330H\251w\250Zl
> \235'\016\222\311\300UA\226\030\374\253WA:w\210<W\255\335\315\022\\K
> \347f\022\343;Tq\376\025\3473k\232\234\227q\013\206o(n\350pM%
> \216\275\253\370f\376\333R"...,
> 65536) = 65536
> 1251553123.668090 SYS_read(4, "\016p", 65536)
>    = 65536
> 1251553123.673774 SYS_read(4, "s\00426A\367\377", 65536)
>    = 23360
> 1251553123.673819 SYS_read(4, "",  
> 40960)                                   = 0
> 1251553123.676090 SYS_close
> (4)                                             = 0
> 1251553123.676119 SYS_munmap(0x7fcaa928a000,  
> 4096)                         = 0
> 1251553123.676199 <... gdk_pixbuf_new_from_file resumed> )
>    = 0x7818f0
>
> Here is a obvious file read going on but what i don't understand is
> why are there multiple SYS_read commands? I thought it was just
> reading the contents of one file in one read command..


The reads correspond to a loop in the internal function  
_gdk_pixbuf_generic_image_load():
http://git.gnome.org/cgit/gtk+/tree/gdk-pixbuf/gdk-pixbuf-io.c#n905

Each loader module specifies how big a chunk of the file it wants to  
read at a time, and this code reads chunks of that size until the load  
is finished.  This loader says it wants to read the compressed data in  
64K chunks.





> 1251553123.676269 printf("\n - %f", ... <unfinished ...>
> 1251553123.676301 SYS_write(1,
> "Sea_of_Humanity_Rio_de_Janeiro_Brazil.jpg\n", 42) = 42
> 1251553123.676364 <... printf resumed> )
>
> So, printf is writing to a file as well? what's happening here?

printf() is defined to write to standard output.  File descriptor 1 is  
standard output.  (Remember, in unix, Everything Is A File.)



> Then right after 1251553123.676364 there are 9 more _Z* functions..
> what do they do?

More c++ iostreams magic.  Building an output filename?


> 1251553123.676818 gdk_pixbuf_get_width(0x7818f0, 200, 2, 200,
> 0x7974696e616d7548) = 1600
> 1251553123.676866 gdk_pixbuf_get_height(0x7818f0, 200, 0x75a310, 200,
> 0x7974696e616d7548) = 1200
> 1251553123.676915 gdk_pixbuf_scale_simple(0x7818f0, 200, 150, 2,
> 0x7974696e616d7548) = 0x781940
> 1251553123.682973
>
> Actual image downscaling to a thumbnail is here. obviously cpu stuff
> and quite fast.
>
> 1251553123.683186 gdk_pixbuf_save(0x781940, 0x782d18, 0x40c9f7,
> 0x7ffffa4b2f88, 0 <unfinished ...>
> ....
> range of memcopy functions
> 1251553123.688447 memcpy(0x0079cc02,
> "D\274Y\257\255\313u\0356\306\234U\337\267\326n\3169\373\364\315mx
> \311\313{\331\211b\003Q\246%'\024lZ\nd\305\266\002\004Nl
> H\203", 8190) = 0x0079cc02
> 1251553123.688507 SYS_write(4, "\211PNG\r\n\032\n", 4096)
>    = 4096
> 1251553123.688547 SYS_write(4,
> "\234\335d\304\332D\262)\213`i\222\212a\273", 4096) = 4096
> ....
> and that roughly 4 more times. I guess a total of ~100 memcpy's
>
> I have a few questions here
> - Why are there so many memcopy functions? and could it be possible
> that it's a memcopy for every horizontal pixel line (since that would
> be roughly equal to the amount of memcpy commands)
> - Why is it doing memcpy (~20 times) the writing to a file then memcpy
> again then writing again till the image is done.. is that logical? I
> might miss the point but i don't see the logic in that. It seems to
> write chuncks os data instead of writing all in one go.

The filenames are truncated in the trace output, so i can only guess  
what the output format actually is.  Is it compressed?  What sort of  
compression?  Since it's a thumbnail, and all the memcpy()s are the  
same length (601 bytes), i'll guess it's probably uncompressed, e.g.  
PPM or XV's P7.  The repeated short memcpy()s look like writing out  
scanlines.  The mmap() of a 4K buffer, followed by a series of memcpy()
s and then a write() follow the pattern of buffered i/o that you'd get  
with

     FILE * fp = fopen (outname, "wb");
     for (i = 0 ; i < scanlines ; i++) {
             next_scanline (scanline, sizeof (scanline), &length);
             fwrite (scanline, length, sizeof (char), fp);
     }

Again, i haven't seen the code, so i'm just saying what the trace  
looks like.



> 1251553123.706736 SYS_write(4,
> "\245\335\375*\300P\024\2067wQI>\250.\036t;"0,\014\006G
> \243\370\231\363\307'O>\363\351\373\357;=\256\\\255\356\bDP\004\203Q
> \230\240\210X"j\001\0042,\233\303\006\001S\024\005W<\256\313r
> \334\311d\377\367\022m\243\235\241\310\251\013_\335*T
> \364\203\365\007\237/\375\366Z&\031\333z"...,
> 3091) = 3091
> 1251553123.706790 SYS_close
> (4)                                             = 0
> 1251553123.707040 SYS_munmap(0x7fcaa928a000,  
> 4096)                         = 0
> 1251553123.707090 <... gdk_pixbuf_save  
> resumed> )                          = 1
>
> The image is saved to the drive here.

Actually, i'll bet it was that write *and* the several preceding it.  
Each one was only 4K, and the last one was only 3091 bytes.


> After that some more _Z* functions and some gdk unref function for
> memory that was reserved. not much strange there.
>
> So, if i'm correct (please say so if i'm not) every single *_read*
> function is a io call. same with write and close. But how do i get the
> numbers from this that you wanted, Cristian?  It is showing the read
> -- cpu -- write part (read as in reading the origional image, cpu as
> in downscaling it, write as in saving it)
>
> Sorry for asking so much questions about this but it's my first time
> with this level of profiling. and i really would like to learn and
> just know this kind of stuff.
>
> Thanx for your patience,
> Mark.



--
One, two, free, four, five, six, sebben, eight, nine, ten, elebben,  
twull, fourteen, sickteen, sebbenteen, eightteen, elebbenteen,  
fiffeen, elebbenteen!
   -- Zella, aged three, counting to twenty.


_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Michael Chudobiak
In reply to this post by jcupitt
On 08/30/2009 09:51 AM, [hidden email] wrote:
> able to supply pixels at a certain size. In particular, libjpeg can do
> a very quick load-at-1/8th-size read where it just decompresses enough
> to be able to get the DC component of each 8x8 block. If you use
> libjpeg like this you can expect around a 100x speedup of the
> decompress step.

The gdk-pixbuf jpeg loader does this already.

The key is to improve the generic down-scaling routines in gdk-pixbuf,
as noted here:

http://bugzilla.gnome.org/show_bug.cgi?id=80925

The scaling routines are currently quite bad at downscaling by large
factors.

For example, last year I used a test folder that had four 15000x400 tif
images and four 15000x400 png images (solid colors). Nautilus took 4
minutes and 30 seconds to thumbnail the folder.

I then patched gdk-pixbuf to do scaling in two steps (scale by root-N
twice, instead of scale by N). With this approach, Nautilus took 8
seconds. See http://bugzilla.gnome.org/show_bug.cgi?id=522803 for details.

I'm not saying that is a good approach (the quality would be poor for
line drawings, for example, but OK for photos). But it shows that there
is lots of room for improvement!

- Mike

_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

jcupitt
2009/8/31 Dr. Michael J. Chudobiak <[hidden email]>:
> On 08/30/2009 09:51 AM, [hidden email] wrote:
>> able to supply pixels at a certain size. In particular, libjpeg can do
>> a very quick load-at-1/8th-size read where it just decompresses enough
>> to be able to get the DC component of each 8x8 block. If you use
>> libjpeg like this you can expect around a 100x speedup of the
>> decompress step.
>
> The gdk-pixbuf jpeg loader does this already.

That's good, but I wonder if this feature is being used? I tried this
tiny program:

------------
#!/usr/bin/python

import sys
from vipsCC import *

thumb = 0
for name in sys.argv[1:]:
        # load at 1/8th size
        im = VImage.VImage (name + ':8')
        scale = 200.0 / im.Xsize()
        # bilinear shrink to 200 px across
        im = im.affine (scale, 0, 0, scale, 0, 0, 0, 0,
                int (im.Xsize() * scale), int (im.Ysize() * scale))
        # write as uncompressed bitmap
        im.write ('thumb%d.v' % thumb)
        thumb += 1
----------

then in a directory with 1,000 1920x1200 180kb jpegs (after flushing the cache):

$ time ~/try/thumb.py *.jpg
real 1m0.495s
user 0m33.442s
sys 0m8.109s

This is on a tiny netbook with a 1.6 GHz Atom CPU, a desktop machine
should be a lot quicker, though it will vary a lot with the detaiuls
of the test I guess (Mark's gdk-pixbuf version took 2m 30s for 1,900
files).

John
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Michael Chudobiak
On 08/31/2009 09:03 AM, [hidden email] wrote:
>>> a very quick load-at-1/8th-size read where it just decompresses enough
>>> to be able to get the DC component of each 8x8 block. If you use
>>> libjpeg like this you can expect around a 100x speedup of the
>>> decompress step.
>>
>> The gdk-pixbuf jpeg loader does this already.
>
> That's good, but I wonder if this feature is being used? I tried this
> tiny program:

I believe this is where it happens:

http://git.gnome.org/cgit/gtk+/tree/gdk-pixbuf/io-jpeg.c#n924


- Mike
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Bugzilla from markg85@gmail.com
On Mon, Aug 31, 2009 at 4:06 PM, Dr. Michael J. Chudobiak
<[hidden email]> wrote:

> On 08/31/2009 09:03 AM, [hidden email] wrote:
>>>>
>>>> a very quick load-at-1/8th-size read where it just decompresses enough
>>>> to be able to get the DC component of each 8x8 block. If you use
>>>> libjpeg like this you can expect around a 100x speedup of the
>>>> decompress step.
>>>
>>> The gdk-pixbuf jpeg loader does this already.
>>
>> That's good, but I wonder if this feature is being used? I tried this
>> tiny program:
>
> I believe this is where it happens:
>
> http://git.gnome.org/cgit/gtk+/tree/gdk-pixbuf/io-jpeg.c#n924
>
>
> - Mike
>

I've done some more testing.
The following test results are still with the same 1927 images.

## Thumbnails generated the nautilus way from:
gnome_desktop_thumbnail_factory_generate_thumbnail() and that function
runs: gnome_desktop_thumbnail_scale_down_pixbuf and not
gdk_pixbuf_scale_simple which is odd to say the least
real 2m43.595s
user 2m42.433s
sys 0m0.950s

Lets say this is our current 100%. for the rest of the benchmarks 100%
is the base.

## With the function: gdk_pixbuf_scale_simple
real 2m19.266s
user 2m17.914s
sys 0m1.077s

Note: 117% faster then gnome_desktop_thumbnail_scale_down_pixbuf


## With the function: gdk_pixbuf_new_from_file_at_scale
real 1m14.422s
user 1m13.605s
sys 0m0.787s

Note: 220% faster then gnome_desktop_thumbnail_scale_down_pixbuf
Note 2: 187% faster then gdk_pixbuf_scale_simple


So, the current way: gnome_desktop_thumbnail_scale_down_pixbuf is the
slowest possible way to go out of the available functions to scale a
image down.
Now i'm wondering why that function
gnome_desktop_thumbnail_scale_down_pixbuf has been made in the forst
place.. the comments above it state:

/**
 * gnome_thumbnail_scale_down_pixbuf:
 * @pixbuf: a #GdkPixbuf
 * @dest_width: the desired new width
 * @dest_height: the desired new height
 *
 * Scales the pixbuf to the desired size. This function
 * is a lot faster than gdk-pixbuf when scaling down by
 * large amounts.
 *
 * Return value: a scaled pixbuf
 *
 * Since: 2.2
 **/

Well, the benchmarks ran above are resizing 1927 wallpaper sized
images to a max width or height of 200 and that function clearly
loses.
The solution to this is extremely simple. Grab this file:
http://git.gnome.org/cgit/gnome-desktop/tree/libgnome-desktop/gnome-desktop-thumbnail.c
and replace the line:
      scaled = gnome_desktop_thumbnail_scale_down_pixbuf (pixbuf,
                                                  floor (width * scale + 0.5),
                                                  floor (height * scale + 0.5));
with a function that is (a lot) faster.

Now i already did more benchmarking with the last function
(gdk_pixbuf_new_from_file_at_scale) only then threaded and that adds a
major speed boost! with me the time for thumbnailing 1927 images
dropped to 31 seconds (552% faster then the current nautilus way) !
but more on that in a later post.

So, how do we proceed from this point?
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Bugzilla from markg85@gmail.com
On Tue, Sep 29, 2009 at 6:00 PM, Mark <[hidden email]> wrote:

> On Mon, Aug 31, 2009 at 4:06 PM, Dr. Michael J. Chudobiak
> <[hidden email]> wrote:
>> On 08/31/2009 09:03 AM, [hidden email] wrote:
>>>>>
>>>>> a very quick load-at-1/8th-size read where it just decompresses enough
>>>>> to be able to get the DC component of each 8x8 block. If you use
>>>>> libjpeg like this you can expect around a 100x speedup of the
>>>>> decompress step.
>>>>
>>>> The gdk-pixbuf jpeg loader does this already.
>>>
>>> That's good, but I wonder if this feature is being used? I tried this
>>> tiny program:
>>
>> I believe this is where it happens:
>>
>> http://git.gnome.org/cgit/gtk+/tree/gdk-pixbuf/io-jpeg.c#n924
>>
>>
>> - Mike
>>
>
> I've done some more testing.
> The following test results are still with the same 1927 images.
>
> ## Thumbnails generated the nautilus way from:
> gnome_desktop_thumbnail_factory_generate_thumbnail() and that function
> runs: gnome_desktop_thumbnail_scale_down_pixbuf and not
> gdk_pixbuf_scale_simple which is odd to say the least
> real    2m43.595s
> user    2m42.433s
> sys     0m0.950s
>
> Lets say this is our current 100%. for the rest of the benchmarks 100%
> is the base.
>
> ## With the function: gdk_pixbuf_scale_simple
> real    2m19.266s
> user    2m17.914s
> sys     0m1.077s
>
> Note: 117% faster then gnome_desktop_thumbnail_scale_down_pixbuf
>
>
> ## With the function: gdk_pixbuf_new_from_file_at_scale
> real    1m14.422s
> user    1m13.605s
> sys     0m0.787s
>
> Note: 220% faster then gnome_desktop_thumbnail_scale_down_pixbuf
> Note 2: 187% faster then gdk_pixbuf_scale_simple
>
>
> So, the current way: gnome_desktop_thumbnail_scale_down_pixbuf is the
> slowest possible way to go out of the available functions to scale a
> image down.
> Now i'm wondering why that function
> gnome_desktop_thumbnail_scale_down_pixbuf has been made in the forst
> place.. the comments above it state:
>
> /**
>  * gnome_thumbnail_scale_down_pixbuf:
>  * @pixbuf: a #GdkPixbuf
>  * @dest_width: the desired new width
>  * @dest_height: the desired new height
>  *
>  * Scales the pixbuf to the desired size. This function
>  * is a lot faster than gdk-pixbuf when scaling down by
>  * large amounts.
>  *
>  * Return value: a scaled pixbuf
>  *
>  * Since: 2.2
>  **/
>
> Well, the benchmarks ran above are resizing 1927 wallpaper sized
> images to a max width or height of 200 and that function clearly
> loses.
> The solution to this is extremely simple. Grab this file:
> http://git.gnome.org/cgit/gnome-desktop/tree/libgnome-desktop/gnome-desktop-thumbnail.c
> and replace the line:
>      scaled = gnome_desktop_thumbnail_scale_down_pixbuf (pixbuf,
>                                                  floor (width * scale + 0.5),
>                                                  floor (height * scale + 0.5));
> with a function that is (a lot) faster.
>
> Now i already did more benchmarking with the last function
> (gdk_pixbuf_new_from_file_at_scale) only then threaded and that adds a
> major speed boost! with me the time for thumbnailing 1927 images
> dropped to 31 seconds (552% faster then the current nautilus way) !
> but more on that in a later post.
>
> So, how do we proceed from this point?
>

Here is the benchmarking script:
http://codepad.org/hDNLjx4p

And what you need to do before you compile and run it:
There are various paths in this script like:
/home/mark/ThumbnailingBenchmarks/2000_Wallpapers-Reanimation/ so you
obviously need to change them to your image source path. Note: there
are more paths then this example!

You can run three benchmarks:
    // GLib Thumbnailing Benchmark
    //GLibThumbnailingBenchmark();

    // Glib more rapid thumbnailing benchmark
    //GLibThumbnailingBenchmarkRapid();

    //Glib threaded thumbnailing (this is the
GLibThumbnailingBenchmarkRapid benchmark only in multiple threads)
    GlibThreadedThumbnailingBenchmark();

by default the threaded benchmark is going to run but you can
obviously comment that one and run one of the other 2 benchmarks. The
difference between the GLibThumbnailingBenchmark and
GLibThumbnailingBenchmarkRapid is that the rapid one uses
gdk_pixbuf_new_from_file_at_scale where the non rapid one uses
gdk_pixbuf_new_from_file which takes significantly longer as noted in
the benchmarks.
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.

Michael Chudobiak
In reply to this post by Bugzilla from markg85@gmail.com
On 09/29/2009 12:00 PM, Mark wrote:

> Well, the benchmarks ran above are resizing 1927 wallpaper sized
> images to a max width or height of 200 and that function clearly
> loses.
> The solution to this is extremely simple. Grab this file:
> http://git.gnome.org/cgit/gnome-desktop/tree/libgnome-desktop/gnome-desktop-thumbnail.c
> and replace the line:
>        scaled = gnome_desktop_thumbnail_scale_down_pixbuf (pixbuf,
>  floor (width * scale + 0.5),
>  floor (height * scale + 0.5));
> with a function that is (a lot) faster.


Mark,

Have you tried benchmarking changes to Nautilus, rather than your own
benchmark program? (That was the original point of this thread, right? I
forget now...)

My understanding of gnome-desktop-thumbnail.c is that the "problem
function" gnome_desktop_thumbnail_scale_down_pixbuf is only called if an
external script returns an over-sized thumbnail (e.g., like a video file
returning the first frame). In most cases (such as jpegs and pngs)
_gdk_pixbuf_new_from_uri_at_scale will be used to return a
correctly-sized thumbnail instead, so the performance of
gnome_desktop_thumbnail_scale_down_pixbuf is a non-issue.

_gdk_pixbuf_new_from_uri_at_scale should use the jpeg scaling trick (in
the jpeg loader), followed by the gdk scaling routines (in the generic
pixbuf io file).

Also, if you're playing with benchmarks, you might test to see if
gnome_desktop_thumbnail_scale_down_pixbuf is superior when downscaling
by huge factors (say, from 16000x16000 to 16x16). The code comments
certainly imply that. The gdk down-scaling routines can blow up and
freeze the system under these conditions.

Anyway, great to see someone looking at the issues... these are just my
thoughts.


- Mike
_______________________________________________
gtk-devel-list mailing list
[hidden email]
http://mail.gnome.org/mailman/listinfo/gtk-devel-list
123