First of all, this adds a function that was missing in bsemathsignal: a Since I was at it already, I also looked at our exp2 approximations. Comparing them to what lolremez would produce, what we do is not ideal. So I also added Remarks:
Relative
You can view, comment on, or merge this pull request online at:https://github.com/tim-janik/beast/pull/124 Commit Summary
File Changes
Patch Links:— _______________________________________________ beast mailing list [hidden email] https://mail.gnome.org/mailman/listinfo/beast |
That's a miniscule range to approximate, given the valid range of this function. Here are the errors of
Now the same for fast_exp2<2>():
I.e. a pure Remez approximation works allmost as well as approxX_exp2 while using one less addmul, but only within -1:+1. I've discarded that approach when I wrote approxX_exp2, because outside of that range the error becomes gigantic. Above, I'm comparing approx3_exp2 with fast_exp2<2>, to show that with just one extra addmul, those giant errors can be avoided (which even fast_exp2<9> cannot accomplish) if the integer part and the fractional part are approximated separately. — _______________________________________________ beast mailing list [hidden email] https://mail.gnome.org/mailman/listinfo/beast |
In reply to this post by Gnome - Beast mailing list
Note that your code adds a needless cast that costs time and precision, the formulas all contain terms like:
For e.g. T=float, this costs precision. X86 FPUs load the constants into one of the internal FPU registers, the internal FPU registers are 80bit wide, casting the constant to float before hand truncates some of the last digits without making the following FPU internal operations any faster.
I.e. I'd strongly recommend to remove that cast. So I'd be interested to see the Remez approximation optimized for only [1,2) (with an error of 0 at [1], this can be achieved by subtracting a constant). So combined with adding the integer part, the approximated function still matches log2() at integer points exactly. — _______________________________________________ beast mailing list [hidden email] https://mail.gnome.org/mailman/listinfo/beast |
In reply to this post by Gnome - Beast mailing list
As mentioned on IRC, enabling optimizations with MODE=release and picking clang++ (6.0 here) vs g++ (7.4 here) makes major differences when benchmarking exp2f and log2f from glibc against our approximations. On a modern AMD64 processor, glibc is often faster. Internally it also uses polynomials around order 4, but picks its coefficients from a table depending on the input argument. With that it achieves errors < 1 ULP and is often speedier because it can also use hand crafted SSE2 implementations.
Here's the error correction I'm talking about, note that exchanging "long double" for "float" makes the code significantly slower, because it forces the compiler to add code to reduce precision. On my machine, this version is roughly as fast as log2f when compiling with optimizations, with both compilers:
Error samples, compared to LOG2L(3):
— _______________________________________________ beast mailing list [hidden email] https://mail.gnome.org/mailman/listinfo/beast |
On 10.09.19 23:54, Tim Janik via beast wrote:
> Error samples, compared to LOG2L(3): > > +0.0, -0.00000231613294631 Sorry, the first line was bogus, a "%+.1f" format string printed 0.01 as +0.0. It should instead read: Error samples, compared to LOG2L(3): +0.1, +0.00000000775829903 +0.5, +0.00000000000000000 +1.0, +0.00000000000000000 +1.1, -0.00000181973000285 +1.5, -0.00000130387210186 +1.8, -0.00000312228549678 +2.0, +0.00000000000000000 +2.2, -0.00000181973000285 +2.5, -0.00000140048214306 +3.0, -0.00000130387210186 +4.0, +0.00000000000000000 +5.0, -0.00000140048214306 +6.0, -0.00000130387210186 +7.0, -0.00000312228549678 +8.0, +0.00000000000000000 +9.0, -0.00000084878575295 +10.0, -0.00000140048214306 +11.0, -0.00000368176020430 +16.0, +0.00000000000000000 +32.0, +0.00000000000000000 +40.0, -0.00000140048214306 +48.0, -0.00000130387210186 +54.0, -0.00000149844406951 +64.0, +0.00000000000000000 +127.0, -0.00000162654178981 +128.0, +0.00000000000000000 -- Yours sincerely, Tim Janik https://testbit.eu/timj Free software author. _______________________________________________ beast mailing list [hidden email] https://mail.gnome.org/mailman/listinfo/beast |
In reply to this post by Gnome - Beast mailing list
Inserting T=float casts makes the function perform better (at least here). It avoids conversions between single precision and double precision values (i.e. cvtsd2ss) which would otherwise be used. So this version
is faster, because all operations are on floats. This costs a bit of precision but the float version (
On the other stuff I mostly agree. If you have use cases in mind (for key tracking or filter frequency modulation it doesn't matter) that need integers k exp2 (k) to be 2^k and you think you want to pay for it with one add-mul, ok. I think relative error is the most important goal here, though. For instance if the key tracking algorithm returns 222 instead of 220, from a muscians point it is as bad as returning 888 instead of 880. Both sound equally wrong, and both have the same relative error (not absolute error). Applying corrections for fast_log2 (2^k) to yield k for integer k sounds ok to me. Note that it doesn't fix fast_log2 (7.999999) to be 3, as you patched only the case where the input is equal to or slightly greater than 2^k, not the case where it is slightly smaller.
This could be fixed by adjusting the linear coeffcient of the remez polynomial, but this would make our worst case error larger, and I think as the result is so close to the perfect value it is probably not worth it. As for whether to approximate at all on AMD64: my impression from the benchmarks is that in many cases using one of the approximations would yield sufficient quality faster that exp2f or log2f. On AMD64 especially when using T=float internally. However, the gain is not dramatic, and maybe we're trying to optimize something with approximations that is not really a performance problem. For instance the LadderFilter (the place where this started) typically only needs one log2 value per note-on. Only portamento would affect this negatively which we do not support at the moment. What I'm trying to say here is: if we use log2f/exp2f and one day we run perf on beast and see than 10% of the CPU usage is spent in exp2f, we could still deal with it at that point in time. — _______________________________________________ beast mailing list [hidden email] https://mail.gnome.org/mailman/listinfo/beast |
Free forum by Nabble | Edit this page |