-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strconv: consider speeding up formatting of small integers with cache #19445
Comments
I recently looked into doing something similar but lower level. When calling string(b) where b is a byte slice of length exactly one, construct a string using the recently introduced runtime.staticbytes, thus avoiding an allocation. This would eliminate the allocation for single-digit numbers. I gave it up because I wasn't convinced it was worth the impact on other []byte to string conversions, but in the meantime, I saw some other optimizations, leading to CL 37791. It is possible that the other optimizations would pay for a "single digit" code path. What do you think, @randall77 @mdempsky? |
Here's a sample implementation (from the ORM-project):
@josharian I like the idea of single digits! I was a bit bummed when string(b) still allocated. |
CL https://golang.org/cl/37963 mentions this issue. |
Your SQL database and network is fast enough that you notice a 7μs to 3μs drop when formatting SQL queries? I'm not totally against this, and I usually like to help fight the allocation battles, but I'm a little surprised this comes up often enough to matter. Does it help on any of our existing benchmarks? |
@bradfitz of course not. I wish it was, though. 😉 I just happened to run
pprof (as part of tests after making a change) and notice strconv came up a
lot. That's all.
It doesn't really affect many existing strconv benchmarks because they deal
with too many numbers > 99. And I haven't tested the entire stdlib.
…On Thu, Mar 9, 2017 at 8:29 AM Brad Fitzpatrick ***@***.***> wrote:
Your SQL database and network is fast enough that you notice a 7μs to 3μs
drop when formatting SQL queries?
I'm not totally against this, and I usually like to help fight the
allocation battles, but I'm a little surprised this comes up often enough
to matter. Does it help on any of our existing benchmarks?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#19445 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFnwZ1YioEBp63Rkf4SpGUvrvnDiiibAks5rkCjogaJpZM4MWGfA>
.
|
Sorry if this will sound bad, but will we now start seeing and accepting patches that further add numbers to this? Why stop at 99 and not at 2147483648? That seems to cover a lot more of the frequent use-cases where timestamps are converted from strings to numbers. |
@dlsniper do you think that's worth the large amount of binary data and compiler work? 99 numbers, in comparison, is a tiny amount. |
@mvdan, what's wrong with a 20 GB binary? Disk space is cheap. We could avoid allocations! :) |
Actually it'd only be about 18 GB. Even better. |
since I'm not sure if I can comment on the CL: why a ~1.7kb (total size, including string headers) array instead of a 190 byte const string? Just to save two subtractions? |
@ericlagergren, you can comment on the CL. I don't see a 2.4k array. Let me turn your question around: why a 190 byte const string over the existing code? |
@bradfitz cool, didn't know that–thanks! I typoed that number. I meant ~1.7k (16 bytes for each string header * 100 elements + the strings' data). The const string is smaller. |
@ericlagergren if you come up with a CL that doesn't complicate the code and keeps performance, I don't see why it wouldn't get merged. |
@ericlagergren, it was originally a const string. Presumably the code review discussion explains why it was changed. I stopped following at some point. |
This seems to be the reason by gri:
|
It's possible @griesemer didn't consider the binary size bloat when making his decision if there was no discussion of it previously. |
@bradfitz That was my concern that sparked the question–I know in the past y'all have made decisions with binary bloat in mind, so I was surprised the CL went with a larger array vs a smaller string. Which led to me wondering if there were other reasons. |
1) choosing 100 numbers (0-99) has the advantage that it might be used to
further optimize conversions (by dividing in steps of 100 instead of 10) -
see https://go-review.googlesource.com/#/c/38255/ as a starting point
2) choosing a slice of strings led to simpler code
Re: 2 I was aware of the extra bloat, but it's kind of hard to judge the
actual impact. If somebody wants to go back and go to a string constant
(the original code didn't have that, it used a initialized byte array
converted to a string), I'm ok with that. It should be trivial to have a
little function that extracts the right substring and use it instead of the
direct access to the slice.
And yes, I'm not too eager to add more tables or complexity to this code
w/o significant gains throughout (at least 5% or more for relevant
benchmarks).
- gri
…On Thu, Mar 16, 2017 at 1:48 PM, Eric Lagergren ***@***.***> wrote:
@bradfitz <https://github.com/bradfitz> That was my concern that sparked
the question–I know in the past y'all have made decisions with binary bloat
in mind, so I was surprised the CL went with a larger array vs a smaller
string. Which led to me wondering if there were other reasons.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19445 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIIkT_SBDUDvBWeup0cC5WNbp5BdpIVjks5rmaAWgaJpZM4MWGfA>
.
|
@griesemer thanks for clarifying it for me. I'll submit a CL if you want, but if CL 38255 is gonna go through then it doesn't seem like there's any point to change it to a const string since it'll just get changed back. |
CL 38255 (which I'm not convinced yet) can still work just fine if you have a
const string. Please send your CL and we'll take it from there.
…On Thu, Mar 16, 2017 at 2:07 PM, Eric Lagergren ***@***.***> wrote:
@griesemer <https://github.com/griesemer> thanks for clarifying it for me.
I'll submit a CL if you want, but if CL 38255 is gonna go through then it
doesn't seem like there's any point to change it to a const string since
it'll just get changed back.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19445 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIIkT-dPoThdFYv3xj6s9kZFNs_B_3v3ks5rmaSRgaJpZM4MWGfA>
.
|
CL https://golang.org/cl/38255 mentions this issue. |
Benchmark results for GOARCH=amd64: name old time/op new time/op delta FormatInt-4 2.51µs ± 2% 2.40µs ± 2% -4.51% (p=0.000 n=9+10) AppendInt-4 1.67µs ± 2% 1.61µs ± 3% -3.74% (p=0.000 n=9+9) FormatUint-4 698ns ± 2% 643ns ± 3% -7.95% (p=0.000 n=10+8) AppendUint-4 478ns ± 1% 418ns ± 2% -12.61% (p=0.000 n=8+10) AppendUintVarlen/1-4 9.30ns ± 6% 9.15ns ± 1% ~ (p=0.199 n=9+10) AppendUintVarlen/12-4 9.12ns ± 0% 9.16ns ± 2% ~ (p=0.307 n=9+9) AppendUintVarlen/123-4 18.6ns ± 2% 18.7ns ± 0% ~ (p=0.091 n=10+6) AppendUintVarlen/1234-4 19.1ns ± 4% 17.7ns ± 1% -7.35% (p=0.000 n=10+9) AppendUintVarlen/12345-4 21.5ns ± 3% 20.7ns ± 3% -3.78% (p=0.002 n=9+10) AppendUintVarlen/123456-4 23.5ns ± 3% 20.9ns ± 1% -11.14% (p=0.000 n=10+9) AppendUintVarlen/1234567-4 25.0ns ± 2% 23.6ns ± 7% -5.48% (p=0.004 n=9+10) AppendUintVarlen/12345678-4 26.8ns ± 2% 23.4ns ± 2% -12.79% (p=0.000 n=9+10) AppendUintVarlen/123456789-4 29.8ns ± 3% 26.5ns ± 5% -11.03% (p=0.000 n=10+10) AppendUintVarlen/1234567890-4 31.6ns ± 3% 26.9ns ± 3% -14.95% (p=0.000 n=10+9) AppendUintVarlen/12345678901-4 33.8ns ± 3% 29.3ns ± 5% -13.21% (p=0.000 n=10+10) AppendUintVarlen/123456789012-4 35.5ns ± 4% 29.2ns ± 4% -17.82% (p=0.000 n=10+10) AppendUintVarlen/1234567890123-4 37.6ns ± 4% 31.4ns ± 3% -16.48% (p=0.000 n=10+10) AppendUintVarlen/12345678901234-4 39.8ns ± 6% 32.0ns ± 7% -19.60% (p=0.000 n=10+10) AppendUintVarlen/123456789012345-4 40.7ns ± 0% 34.4ns ± 4% -15.55% (p=0.000 n=6+10) AppendUintVarlen/1234567890123456-4 45.4ns ± 6% 35.1ns ± 4% -22.66% (p=0.000 n=10+10) AppendUintVarlen/12345678901234567-4 45.1ns ± 1% 36.7ns ± 4% -18.77% (p=0.000 n=9+10) AppendUintVarlen/123456789012345678-4 46.9ns ± 0% 36.4ns ± 3% -22.49% (p=0.000 n=9+10) AppendUintVarlen/1234567890123456789-4 50.6ns ± 6% 38.8ns ± 3% -23.28% (p=0.000 n=10+10) AppendUintVarlen/12345678901234567890-4 51.3ns ± 2% 38.4ns ± 0% -25.00% (p=0.000 n=9+8) Benchmark results for GOARCH=386: name old time/op new time/op delta FormatInt-4 6.21µs ± 0% 6.14µs ± 0% -1.11% (p=0.008 n=5+5) AppendInt-4 4.95µs ± 0% 4.85µs ± 0% -1.99% (p=0.016 n=5+4) FormatUint-4 1.89µs ± 1% 1.83µs ± 1% -2.94% (p=0.008 n=5+5) AppendUint-4 1.59µs ± 0% 1.57µs ± 2% -1.72% (p=0.040 n=5+5) FormatIntSmall-4 8.48ns ± 0% 8.48ns ± 0% ~ (p=0.905 n=5+5) AppendIntSmall-4 12.2ns ± 0% 12.2ns ± 0% ~ (all equal) AppendUintVarlen/1-4 10.6ns ± 1% 10.7ns ± 0% ~ (p=0.238 n=5+4) AppendUintVarlen/12-4 10.7ns ± 0% 10.7ns ± 1% ~ (p=0.333 n=4+5) AppendUintVarlen/123-4 29.9ns ± 1% 30.2ns ± 0% +1.07% (p=0.016 n=5+4) AppendUintVarlen/1234-4 32.4ns ± 1% 30.4ns ± 0% -6.30% (p=0.008 n=5+5) AppendUintVarlen/12345-4 35.1ns ± 2% 34.9ns ± 0% ~ (p=0.238 n=5+5) AppendUintVarlen/123456-4 36.6ns ± 0% 35.3ns ± 0% -3.55% (p=0.029 n=4+4) AppendUintVarlen/1234567-4 38.9ns ± 0% 39.6ns ± 0% +1.80% (p=0.029 n=4+4) AppendUintVarlen/12345678-4 41.3ns ± 0% 40.1ns ± 0% -2.91% (p=0.000 n=5+4) AppendUintVarlen/123456789-4 44.9ns ± 1% 44.8ns ± 0% ~ (p=0.667 n=5+5) AppendUintVarlen/1234567890-4 65.6ns ± 0% 66.2ns ± 1% +0.88% (p=0.016 n=4+5) AppendUintVarlen/12345678901-4 77.9ns ± 0% 76.3ns ± 0% -2.00% (p=0.000 n=4+5) AppendUintVarlen/123456789012-4 80.7ns ± 0% 79.1ns ± 1% -2.01% (p=0.008 n=5+5) AppendUintVarlen/1234567890123-4 83.6ns ± 0% 80.2ns ± 1% -4.07% (p=0.008 n=5+5) AppendUintVarlen/12345678901234-4 86.2ns ± 1% 83.3ns ± 0% -3.39% (p=0.008 n=5+5) AppendUintVarlen/123456789012345-4 88.5ns ± 0% 83.7ns ± 0% -5.42% (p=0.008 n=5+5) AppendUintVarlen/1234567890123456-4 90.6ns ± 0% 88.3ns ± 0% -2.54% (p=0.008 n=5+5) AppendUintVarlen/12345678901234567-4 92.7ns ± 0% 89.0ns ± 1% -4.01% (p=0.008 n=5+5) AppendUintVarlen/123456789012345678-4 95.6ns ± 1% 92.6ns ± 0% -3.18% (p=0.016 n=5+4) AppendUintVarlen/1234567890123456789-4 118ns ± 0% 114ns ± 0% ~ (p=0.079 n=4+5) AppendUintVarlen/12345678901234567890-4 138ns ± 0% 136ns ± 0% -1.45% (p=0.008 n=5+5) Updates #19445 Change-Id: Iafbe5c074898187c150dc3854e5b9fc19c10be05 Reviewed-on: https://go-review.googlesource.com/38255 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?1.8
What operating system and processor architecture are you using (
go env
)?What, why, etc.
While profiling an ORM-like program, I discovered most of the 100+ allocs/op were from calls to strconv.Itoa. By implementing a simple cache in front of strconv.Itoa, the number of allocs dropped by over half and so did the runtime of the function (7 to 3 μs). I looked at some other of our projects and noticed most of the Itoa-like calls were used to format smaller numbers (i.e., < 100). I'd imagine other project that generate SQL or graph query code and need to format parameters ($1, $2, ...) would see decent performance improvements as well.
A small test I ran shows no appreciable difference when the majority of N is > 99 or N is a pseudo-random number in [0, b.N), but runs in 1/10 the time where N < 100 and doesn't incur any allocations.
My issues with this idea:
FWIW the cache only needs to be a 190-byte long string.
The text was updated successfully, but these errors were encountered: