- 
                Notifications
    You must be signed in to change notification settings 
- Fork 212
Faster radix conversion #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
| Yepp, should have tested  I got  | 
| Cannot happen at that place, result carefully chosen to not exceed  | 
| Oh my, LTM's testing environment is really killing me! But it's all green now, so: Happy Holidays, y'all! | 
| Was able to make Helmström's N-R trick working for base 10. Benchmarked against div-only up to a string length of  The N-R trick is about as fast/slow as div-standalone, maybe a bit slower for small input but it uses much more heap. I don't know if the large amount of recursions (not only the up to 29-depth tree for the D&C itself but also the recursions from the Karatsuba and Toom-Cook multiplications and fast division is implemented recursively, too) is a problem beside my testing. Can try to change the recursions in  | 
| Added the N-R method for testing. Add  | 
4840c90    to
    43241b0      
    Compare
  
    6188f05    to
    24cfac8      
    Compare
  
    452cc1d    to
    2f9a008      
    Compare
  
    | Needs a bit of a clean-up but otherwise good to go. | 
c5ccf14    to
    e60867d      
    Compare
  
    | I took the liberty to rebase, add some more changes and force-push the branch. Could we maybe squash this list of commits and fixups from 2020 and 2021 into 1-3 commits? Or does it make sense to retain the history? Maybe those N-R based tests could be of interest? | 
| It's always nice when you come to your bench, fresh coffee in your pot just to find out that all of the work has been done already! :-) Thanks a lot! 
 Yes, of course. 
 Oh, please don't! It was quite a painful experience: large numbers mean long waits for the tests to complete just to find out that something didn't quite work out at the end and so on. Nah, dust it with quicklime, bury the remains and let the horses run over the ground such that nobody can find it. 
 Yes, but it either needs a table with the corrections for which I have only empirical methods to find the values at this point, which is a bit cumbersome to say the least, or another N-R round. The difference in speed is already negligible and the second Newton round would ruin that slight speed advantage completely. What I vaguely remember to have planned was to pluck the Barret division out as a distinct function. But we can still do that if there is an actual use-case. | 
e60867d    to
    e5ce2d9      
    Compare
  
    | Removed all of the little commits bragging about doing chores (e.g.: correcting typos etc). Backup of old branch is in faster_radix_conversion_full_history, just in case. I had a bit of a hick-up while merging with local, hope I have repaired it successfully. | 
| @MasterDuke17 Your are marked as one of the reviewers. Do you want to add your 2 cents, too? I mean: more eyes, more better! to steal the term from A.v.E. | 
e5ce2d9    to
    eea3a5f      
    Compare
  
    | The timings here are quite different from the timings on my machine. the smaller the digit the smaller the relation because we use  | 
Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
instead of all those copies Signed-off-by: Steffen Jaeckel <s@jaeckel.eu>
fd23d2f    to
    28b373d      
    Compare
  
    
Implemented the Schönhage method (Divide&Conquer) for radix conversion. I used normal division because Lars Helmström's method was a bit unstable for larger input (or I was to stupid to implement it, but the values are different to the normal division ).
Speed enhancement is as expected:
Tested
s_mp_faster_to_radixwith the loopold: 324 sec 119 ms 629 usec 727 nsec
new: 9 sec 149 ms 193 usec 858
Same loop with base 10 only: (
i = 10fixed) andj = 17477( log(17477 * 60)/log(10) ~ 6.0206 over a million decimal places)old: 62 sec 920 ms 351 usec 67 nsec
new: 0 sec 458 ms 471 usec 969 nsec
Results have been checked against the old function.
Looks good enough to me.
I did not benchmark the new
s_mp_faster_read_radixand testing is only done with a small round-about test intest.c.Ah, yes: restricting the new method to base 10 only wouldn't have saved a lot. One additional small table and about half a dozen lines of code would have been be saved, maybe 150 bytes or so, if at all.
I used some Ideas from Lars Helmström and some more from @MasterDuke17 to write this code. Don't blame them for my mistakes, please.