[tim-janik/beast] BSE: bsemathsignal: add approximations: Bse::fast_log2 and Bse::fast_exp2 (#124)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[tim-janik/beast] BSE: bsemathsignal: add approximations: Bse::fast_log2 and Bse::fast_exp2 (#124)

Gnome - Beast mailing list

First of all, this adds a function that was missing in bsemathsignal: a fast_log2 implementation. It works by using the float exponent and approximating the rest of the value using a polynomial. I found a tool that gives optimal polynomial coefficients for a given order: https://github.com/samhocevar/lolremez

Since I was at it already, I also looked at our exp2 approximations. Comparing them to what lolremez would produce, what we do is not ideal. So I also added fast_exp2. This has consistently better relative errors. So I suggest as next step replacing bse_approxN_exp2 by fast_exp2<N> everywhere, which should be as fast, but more accurate. Please review the API and let me know if this can be the canonical API for porting the old functions to.

Remarks:

  • unlike bse_approxN_exp2, our new fast_exp2<N> no longer guarantees almost zero error if the argument is an integer; however I think we don't really need to preserve this property
  • I like Bse::fast_log2 better than Bse::approx_log2 because when reading code "fast" gives me an idea what the function does and why it was used instead of log2

Relative exp2(x) approximation errors in interval [-1:1]:

rxprec: bse_approx2_exp2: 0.009017387
rxprec: bse_approx3_exp2: 0.0007944462
rxprec: bse_approx4_exp2: 5.568432e-05
rxprec: bse_approx5_exp2: 3.24224e-06
rxprec: bse_approx6_exp2: 1.614916e-07
rxprec: bse_approx7_exp2: 7.028907e-09
rxprec: bse_approx8_exp2: 2.716875e-10
rxprec: bse_approx9_exp2: 9.445048e-12

rxprec: fast_exp2<2, double>: 0.001724763
rxprec: fast_exp2<3, double>: 7.478144e-05
rxprec: fast_exp2<4, double>: 2.593371e-06
rxprec: fast_exp2<5, double>: 7.493647e-08
rxprec: fast_exp2<6, double>: 1.8558e-09
rxprec: fast_exp2<7, double>: 4.021119e-11
rxprec: fast_exp2<8, double>: 7.746751e-13
rxprec: fast_exp2<9, double>: 1.375958e-14

You can view, comment on, or merge this pull request online at:

  https://github.com/tim-janik/beast/pull/124

Commit Summary

  • BSE: bsemathsignal: add approximations: Bse::fast_log2 and Bse::fast_exp2

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/tim-janik/beast/pull/124?email_source=notifications\u0026email_token=AIVS7XTPPI4D5UDQN56I4TDQI6IABA5CNFSM4IVHCZKKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HKN43FA", "url": "https://github.com/tim-janik/beast/pull/124?email_source=notifications\u0026email_token=AIVS7XTPPI4D5UDQN56I4TDQI6IABA5CNFSM4IVHCZKKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HKN43FA", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
_______________________________________________
beast mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: [tim-janik/beast] BSE: bsemathsignal: add approximations: Bse::fast_log2 and Bse::fast_exp2 (#124)

Gnome - Beast mailing list

Relative exp2(x) approximation errors in interval [-1:1]:

That's a miniscule range to approximate, given the valid range of this function. Here are the errors of
bse_approx3_exp2() at mostly integer steps with some omissions for brevity:

   -2.0, +0.00000000000000000
   -1.5, -0.00016133294256113
   -1.0, +0.00000000000000000
   -0.5, -0.00032266588512226
   +0.0, +0.00000000000000000
   +0.5, -0.00112351661969536
   +1.0, +0.00000000000000000
   +1.5, -0.00224703323939071
   +2.0, +0.00000000000000000
   +2.5, -0.00449406647878142
   +7.0, +0.00000000000000000
  +11.0, +0.00000000000000000
  +16.0, +0.00000000000000000
  +32.0, +0.00000000000000000
  +40.0, +0.00000000000000000
  +48.0, +0.00000000000000000
  +54.0, +0.00000000000000000
  +64.0, +0.00000000000000000
 +127.0, +0.00000000000000000

Now the same for fast_exp2<2>():

   -2.0, +0.00011078548906696
   -1.5, -0.00060979588254370
   -1.0, +0.00022157097813391
   -0.5, -0.00121959176508741
   +0.0, +0.00044314195626782
   +0.5, +0.00243918353017435
   +1.0, +0.00088628391253565
   +1.5, +0.00487836706034869
   +2.0, +0.00177256782507129
   +2.5, +0.00975673412069738
   +7.0, +0.05672217040228134
  +11.0, +0.90755472643650137
  +16.0, +29.04175124596804380
  +32.0, +1903280.20965576171875000
  +40.0, +487239733.671875
  +48.0, +124733371820
  +54.0, +7982935796480
  +64.0, +8174526255595520
 +127.0, +75396696880394894980022595129180160

I.e. a pure Remez approximation works allmost as well as approxX_exp2 while using one less addmul, but only within -1:+1. I've discarded that approach when I wrote approxX_exp2, because outside of that range the error becomes gigantic. Above, I'm comparing approx3_exp2 with fast_exp2<2>, to show that with just one extra addmul, those giant errors can be avoided (which even fast_exp2<9> cannot accomplish) if the integer part and the fractional part are approximated separately.
Considering that fast_exp2<>() are not even minimizing the error at exactly -1, +1, or at 0 at the very least, I'd at most add them as rough approximations to be used only when the input range is known to be -1..+1 (which is not the case for general audio signals in BSE). For that it might be named fast_narrow_exp2<>() or similar, but I think fast_exp2<>() is better used if we rename the current bse_approxX_exp2 functions.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/tim-janik/beast/pull/124?email_source=notifications\u0026email_token=AIVS7XXVUUF4GWHTTSBW5ETQI7CKHA5CNFSM4IVHCZKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6LVXEQ#issuecomment-530013074", "url": "https://github.com/tim-janik/beast/pull/124?email_source=notifications\u0026email_token=AIVS7XXVUUF4GWHTTSBW5ETQI7CKHA5CNFSM4IVHCZKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6LVXEQ#issuecomment-530013074", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
_______________________________________________
beast mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: [tim-janik/beast] BSE: bsemathsignal: add approximations: Bse::fast_log2 and Bse::fast_exp2 (#124)

Gnome - Beast mailing list
In reply to this post by Gnome - Beast mailing list

First of all, this adds a function that was missing in bsemathsignal: a fast_log2 implementation. It works by using the float exponent and approximating the rest of the value using a polynomial. I found a tool that gives optimal polynomial coefficients for a given order: https://github.com/samhocevar/lolremez

Note that your code adds a needless cast that costs time and precision, the formulas all contain terms like:

  u = u * x + T (constant); // T ∈ { float, double, long double }

For e.g. T=float, this costs precision. X86 FPUs load the constants into one of the internal FPU registers, the internal FPU registers are 80bit wide, casting the constant to float before hand truncates some of the last digits without making the following FPU internal operations any faster.
A similar precision loss can be observed with AMD64, example:

fast_log2<6, double> (+1.5)

    0.00000054181458465

fast_log2<6, float> (+1.5)

    0.00000153622037214

I.e. I'd strongly recommend to remove that cast.
Also, Wikipedia suggests that similarly to exp2(), a log2 approximation can be split into a quick integer part approximation and a severely constrained approximation for the fractional part that only affects the intervall [1,2): https://en.wikipedia.org/wiki/Binary_logarithm#Iterative_approximation

So I'd be interested to see the Remez approximation optimized for only [1,2) (with an error of 0 at [1], this can be achieved by subtracting a constant). So combined with adding the integer part, the approximated function still matches log2() at integer points exactly.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/tim-janik/beast/pull/124?email_source=notifications\u0026email_token=AIVS7XXJQKX7DPLM6D5ZNNTQI7GA7A5CNFSM4IVHCZKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6LYXQQ#issuecomment-530025410", "url": "https://github.com/tim-janik/beast/pull/124?email_source=notifications\u0026email_token=AIVS7XXJQKX7DPLM6D5ZNNTQI7GA7A5CNFSM4IVHCZKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6LYXQQ#issuecomment-530025410", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
_______________________________________________
beast mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: [tim-janik/beast] BSE: bsemathsignal: add approximations: Bse::fast_log2 and Bse::fast_exp2 (#124)

Gnome - Beast mailing list
In reply to this post by Gnome - Beast mailing list

As mentioned on IRC, enabling optimizations with MODE=release and picking clang++ (6.0 here) vs g++ (7.4 here) makes major differences when benchmarking exp2f and log2f from glibc against our approximations. On a modern AMD64 processor, glibc is often faster. Internally it also uses polynomials around order 4, but picks its coefficients from a table depending on the input argument. With that it achieves errors < 1 ULP and is often speedier because it can also use hand crafted SSE2 implementations.
I haven't had a chance to benchmark the approximations on musl, but so far, based on your submission, I'm inclined to integrate the following:

  1. Rename bse_approx6_exp2 to fast_exp2() and get rid of the other approximation variants.
  2. Add fast_log2() based on your 6th order version, but with error correction for integer logarithms.
  3. When building for AMD64, use exp2f to implement fast_exp2 and use log2f to implement fast_log2.

Here's the error correction I'm talking about, note that exchanging "long double" for "float" makes the code significantly slower, because it forces the compiler to add code to reduce precision. On my machine, this version is roughly as fast as log2f when compiling with optimizations, with both compilers:

static inline long double G_GNUC_CONST
fast_log2f (float value)
{
  union {
    float f;
    int i;
  } float_u;
  float_u.f = value;
  // compute log_2 using float exponent
  const int log_2 = ((float_u.i >> 23) & 255) - 128;
  // replace float exponent
  float_u.i &= ~(255 << 23);
  float_u.i += BSE_FLOAT_BIAS << 23;
  long double u, x = float_u.f;
  // lolremez --long-double -d 6 -r 1:2 "log(x)/log(2)+1-0.00000184568668708"
  u =         -2.5691088815846393966e-2l;
  u = u * x +  2.7514877034856806734e-1l;
  u = u * x + -1.2669182593669424748l;
  u = u * x +  3.2865287704176774059l;
  u = u * x + -5.3419892025067624343l;
  u = u * x +  6.1129631283200211528l;
  x = u * x + -2.040042118396715321l;
  return x + log_2;
}

Error samples, compared to LOG2L(3):

   +0.0, -0.00000231613294631
   +0.5, +0.00000000000000000
   +1.0, +0.00000000000000000
   +1.1, -0.00000181973000285
   +1.5, -0.00000130387210186
   +1.8, -0.00000312228549678
   +2.0, +0.00000000000000000
   +2.2, -0.00000181973000285
   +2.5, -0.00000140048214306
   +3.0, -0.00000130387210186
   +4.0, +0.00000000000000000
   +5.0, -0.00000140048214306
   +6.0, -0.00000130387210186
   +7.0, -0.00000312228549678
   +8.0, +0.00000000000000000
   +9.0, -0.00000084878575295
  +10.0, -0.00000140048214306
  +11.0, -0.00000368176020430
  +16.0, +0.00000000000000000
  +32.0, +0.00000000000000000
  +40.0, -0.00000140048214306
  +48.0, -0.00000130387210186
  +54.0, -0.00000149844406951
  +64.0, +0.00000000000000000
 +127.0, -0.00000162654178981
 +128.0, +0.00000000000000000


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/tim-janik/beast/pull/124?email_source=notifications\u0026email_token=AIVS7XQUT7TOND4ECRGBPRTQJAJQJA5CNFSM4IVHCZKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6MTTRY#issuecomment-530135495", "url": "https://github.com/tim-janik/beast/pull/124?email_source=notifications\u0026email_token=AIVS7XQUT7TOND4ECRGBPRTQJAJQJA5CNFSM4IVHCZKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6MTTRY#issuecomment-530135495", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
_______________________________________________
beast mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: [tim-janik/beast] BSE: bsemathsignal: add approximations: Bse::fast_log2 and Bse::fast_exp2 (#124)

Tim Janik-6
On 10.09.19 23:54, Tim Janik via beast wrote:
> Error samples, compared to LOG2L(3):
>
> +0.0, -0.00000231613294631

Sorry, the first line was bogus, a "%+.1f" format string printed 0.01 as +0.0.

It should instead read:

Error samples, compared to LOG2L(3):

   +0.1, +0.00000000775829903
   +0.5, +0.00000000000000000
   +1.0, +0.00000000000000000
   +1.1, -0.00000181973000285
   +1.5, -0.00000130387210186
   +1.8, -0.00000312228549678
   +2.0, +0.00000000000000000
   +2.2, -0.00000181973000285
   +2.5, -0.00000140048214306
   +3.0, -0.00000130387210186
   +4.0, +0.00000000000000000
   +5.0, -0.00000140048214306
   +6.0, -0.00000130387210186
   +7.0, -0.00000312228549678
   +8.0, +0.00000000000000000
   +9.0, -0.00000084878575295
  +10.0, -0.00000140048214306
  +11.0, -0.00000368176020430
  +16.0, +0.00000000000000000
  +32.0, +0.00000000000000000
  +40.0, -0.00000140048214306
  +48.0, -0.00000130387210186
  +54.0, -0.00000149844406951
  +64.0, +0.00000000000000000
 +127.0, -0.00000162654178981
 +128.0, +0.00000000000000000



--
Yours sincerely,
Tim Janik

https://testbit.eu/timj
Free software author.
_______________________________________________
beast mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/beast
Reply | Threaded
Open this post in threaded view
|

Re: [tim-janik/beast] BSE: bsemathsignal: add approximations: Bse::fast_log2 and Bse::fast_exp2 (#124)

Gnome - Beast mailing list
In reply to this post by Gnome - Beast mailing list

Inserting T=float casts makes the function perform better (at least here). It avoids conversions between single precision and double precision values (i.e. cvtsd2ss) which would otherwise be used. So this version

static inline float G_GNUC_CONST
fast_log2ff (float value)
{
  union {
    float f;
    int i;
  } float_u;
  float_u.f = value;
  // compute log_2 using float exponent
  const int log_2 = ((float_u.i >> 23) & 255) - 128;
  // replace float exponent
  float_u.i &= ~(255 << 23);
  float_u.i += BSE_FLOAT_BIAS << 23;
  typedef float T;
  T u, x = float_u.f;
  // lolremez --long-double -d 6 -r 1:2 "log(x)/log(2)+1-0.00000184568668708"
  u =         T (-2.5691088815846393966e-2l);
  u = u * x + T (2.7514877034856806734e-1l);
  u = u * x + T (-1.2669182593669424748l);
  u = u * x + T (3.2865287704176774059l);
  u = u * x + T (-5.3419892025067624343l);
  u = u * x + T (6.1129631283200211528l);
  x = u * x + T (-2.040042118396715321l);
  return x + log_2;
}

is faster, because all operations are on floats. This costs a bit of precision but the float version (fast_log2ff) is faster than using double (fast_log2fd) or long double (fast_log2fl).

$ g++ -std=c++17 -Wall -g -O3 -o l2 l2.cc `pkg-config --cflags --libs spectmorph glib-2.0 bse`
$ l2
log2f: 5.369997 ns/call
fast_log2fl: 7.890487 ns/call
fast_log2fd: 4.662395 ns/call
fast_log2ff: 3.652096 ns/call

prec: fast_log2ff: 4.493532e-06
prec: fast_log2fd: 3.721012e-06
prec: fast_log2fl: 3.691373e-06


$ clang++ -std=c++17 -g -O3 -o l2 l2.cc `pkg-config --cflags --libs spectmorph glib-2.0 bse`
$ l2
log2f: 5.323792 ns/call
fast_log2fl: 7.597113 ns/call
fast_log2fd: 5.201006 ns/call
fast_log2ff: 4.071403 ns/call

prec: fast_log2ff: 4.493532e-06
prec: fast_log2fd: 3.721012e-06
prec: fast_log2fl: 3.691373e-06

On the other stuff I mostly agree. If you have use cases in mind (for key tracking or filter frequency modulation it doesn't matter) that need integers k exp2 (k) to be 2^k and you think you want to pay for it with one add-mul, ok. I think relative error is the most important goal here, though. For instance if the key tracking algorithm returns 222 instead of 220, from a muscians point it is as bad as returning 888 instead of 880. Both sound equally wrong, and both have the same relative error (not absolute error).

Applying corrections for fast_log2 (2^k) to yield k for integer k sounds ok to me. Note that it doesn't fix fast_log2 (7.999999) to be 3, as you patched only the case where the input is equal to or slightly greater than 2^k, not the case where it is slightly smaller.

fast_log2fl (7.999999) = 2.999996; log2f (7.999999) = 3.000000
fast_log2fl (8.000000) = 3.000000; log2f (8.000000) = 3.000000
fast_log2fl (8.000001) = 3.000000; log2f (8.000001) = 3.000000

This could be fixed by adjusting the linear coeffcient of the remez polynomial, but this would make our worst case error larger, and I think as the result is so close to the perfect value it is probably not worth it.

As for whether to approximate at all on AMD64: my impression from the benchmarks is that in many cases using one of the approximations would yield sufficient quality faster that exp2f or log2f. On AMD64 especially when using T=float internally.

However, the gain is not dramatic, and maybe we're trying to optimize something with approximations that is not really a performance problem. For instance the LadderFilter (the place where this started) typically only needs one log2 value per note-on. Only portamento would affect this negatively which we do not support at the moment. What I'm trying to say here is: if we use log2f/exp2f and one day we run perf on beast and see than 10% of the CPU usage is spent in exp2f, we could still deal with it at that point in time.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/tim-janik/beast/pull/124?email_source=notifications\u0026email_token=AIVS7XU25QYRCV6TAIRMGTLQJC6JFA5CNFSM4IVHCZKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6N6NYA#issuecomment-530310880", "url": "https://github.com/tim-janik/beast/pull/124?email_source=notifications\u0026email_token=AIVS7XU25QYRCV6TAIRMGTLQJC6JFA5CNFSM4IVHCZKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6N6NYA#issuecomment-530310880", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
_______________________________________________
beast mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/beast