strconv: consider speeding up formatting of small integers with cache · Issue #19445 · golang/go · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strconv: consider speeding up formatting of small integers with cache #19445

Closed
ericlagergren opened this issue Mar 7, 2017 · 22 comments
Closed

Comments

@ericlagergren
Copy link
Contributor

ericlagergren commented Mar 7, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.8

What operating system and processor architecture are you using (go env)?

GOOS=darwin
GOARCH=amd64

What, why, etc.

While profiling an ORM-like program, I discovered most of the 100+ allocs/op were from calls to strconv.Itoa. By implementing a simple cache in front of strconv.Itoa, the number of allocs dropped by over half and so did the runtime of the function (7 to 3 μs). I looked at some other of our projects and noticed most of the Itoa-like calls were used to format smaller numbers (i.e., < 100). I'd imagine other project that generate SQL or graph query code and need to format parameters ($1, $2, ...) would see decent performance improvements as well.

A small test I ran shows no appreciable difference when the majority of N is > 99 or N is a pseudo-random number in [0, b.N), but runs in 1/10 the time where N < 100 and doesn't incur any allocations.

My issues with this idea:

  • Small range of numbers (assumes that most calls are < 100)
  • Only works with base 10
  • Negative numbers need their own cache or an alloc needs to be made to contact the number with "-"

FWIW the cache only needs to be a 190-byte long string.

name             old time/op    new time/op    delta
SmallItoaAll-4     45.8ns ± 0%    47.4ns ± 0%   ~     (p=1.000 n=1+1)
SmallItoaRand-4    90.9ns ± 0%    90.5ns ± 0%   ~     (p=1.000 n=1+1)
SmallItoa-4        31.4ns ± 0%     3.4ns ± 0%   ~     (p=1.000 n=1+1)

name             old alloc/op   new alloc/op   delta
SmallItoaAll-4      8.00B ± 0%     8.00B ± 0%   ~     (all equal)
SmallItoaRand-4     7.00B ± 0%     7.00B ± 0%   ~     (all equal)
SmallItoa-4         1.00B ± 0%     0.00B        ~     (p=1.000 n=1+1)

name             old allocs/op  new allocs/op  delta
SmallItoaAll-4       1.00 ± 0%      1.00 ± 0%   ~     (all equal)
SmallItoaRand-4      1.00 ± 0%      0.00        ~     (p=1.000 n=1+1)
SmallItoa-4          1.00 ± 0%      0.00        ~     (p=1.000 n=1+1)
@josharian
Copy link
Contributor

I recently looked into doing something similar but lower level. When calling string(b) where b is a byte slice of length exactly one, construct a string using the recently introduced runtime.staticbytes, thus avoiding an allocation. This would eliminate the allocation for single-digit numbers.

I gave it up because I wasn't convinced it was worth the impact on other []byte to string conversions, but in the meantime, I saw some other optimizations, leading to CL 37791. It is possible that the other optimizations would pay for a "single digit" code path.

What do you think, @randall77 @mdempsky?

@ericlagergren
Copy link
Contributor Author

ericlagergren commented Mar 7, 2017

Here's a sample implementation (from the ORM-project):

// smallInts is all integers [0, 99] and only 190 bytes.
const smallInts = "0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899"

func itoa(i int64) string {
	if i < 10 {
		return smallInts[i : i+1]
	}
	if i < 100 {
		return smallInts[2*i-10 : 2*i-8]
	}
	return strconv.FormatInt(i, 10)
}

@josharian I like the idea of single digits! I was a bit bummed when string(b) still allocated.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/37963 mentions this issue.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2017

Your SQL database and network is fast enough that you notice a 7μs to 3μs drop when formatting SQL queries?

I'm not totally against this, and I usually like to help fight the allocation battles, but I'm a little surprised this comes up often enough to matter. Does it help on any of our existing benchmarks?

@ericlagergren
Copy link
Contributor Author

ericlagergren commented Mar 9, 2017 via email

@dlsniper
Copy link
Contributor

Sorry if this will sound bad, but will we now start seeing and accepting patches that further add numbers to this? Why stop at 99 and not at 2147483648? That seems to cover a lot more of the frequent use-cases where timestamps are converted from strings to numbers.

@mvdan
Copy link
Member

mvdan commented Mar 16, 2017

@dlsniper do you think that's worth the large amount of binary data and compiler work? 99 numbers, in comparison, is a tiny amount.

@bradfitz
Copy link
Contributor

@mvdan, what's wrong with a 20 GB binary? Disk space is cheap. We could avoid allocations! :)

@bradfitz
Copy link
Contributor

Actually it'd only be about 18 GB. Even better.

@ericlagergren
Copy link
Contributor Author

ericlagergren commented Mar 16, 2017

since I'm not sure if I can comment on the CL: why a ~1.7kb (total size, including string headers) array instead of a 190 byte const string? Just to save two subtractions?

@bradfitz
Copy link
Contributor

@ericlagergren, you can comment on the CL.

I don't see a 2.4k array.

Let me turn your question around: why a 190 byte const string over the existing code?

@ericlagergren
Copy link
Contributor Author

@bradfitz cool, didn't know that–thanks!

I typoed that number. I meant ~1.7k (16 bytes for each string header * 100 elements + the strings' data).

The const string is smaller.

@mvdan
Copy link
Member

mvdan commented Mar 16, 2017

@ericlagergren if you come up with a CL that doesn't complicate the code and keeps performance, I don't see why it wouldn't get merged.

@bradfitz
Copy link
Contributor

@ericlagergren, it was originally a const string. Presumably the code review discussion explains why it was changed. I stopped following at some point.

@mvdan
Copy link
Member

mvdan commented Mar 16, 2017

This seems to be the reason by gri:

For one, have you tried actually creating the slice of strings:
var smallInts = []string{"0", "1", "2", ...} ? Then you don't need an extra if and you can just index into it. You don't even need the formatSmallInt function and all the conversion stuff around it.

@ericlagergren
Copy link
Contributor Author

ericlagergren commented Mar 16, 2017

@bradfitz yeah, I read through it and it seems like the idea was trading 1.5kb for two subtractions (-10 and -8). Thus my question.

@mvdan right, I was just asking why the CL went that direction since I wasn't aware I could comment on the CL itself. :-)

@bradfitz
Copy link
Contributor

It's possible @griesemer didn't consider the binary size bloat when making his decision if there was no discussion of it previously.

@ericlagergren
Copy link
Contributor Author

@bradfitz That was my concern that sparked the question–I know in the past y'all have made decisions with binary bloat in mind, so I was surprised the CL went with a larger array vs a smaller string. Which led to me wondering if there were other reasons.

@griesemer
Copy link
Contributor

griesemer commented Mar 16, 2017 via email

@ericlagergren
Copy link
Contributor Author

@griesemer thanks for clarifying it for me.

I'll submit a CL if you want, but if CL 38255 is gonna go through then it doesn't seem like there's any point to change it to a const string since it'll just get changed back.

@griesemer
Copy link
Contributor

griesemer commented Mar 16, 2017 via email

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/38255 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 22, 2017
Benchmark results for GOARCH=amd64:

name                                     old time/op  new time/op  delta
FormatInt-4                              2.51µs ± 2%  2.40µs ± 2%   -4.51%  (p=0.000 n=9+10)
AppendInt-4                              1.67µs ± 2%  1.61µs ± 3%   -3.74%  (p=0.000 n=9+9)
FormatUint-4                              698ns ± 2%   643ns ± 3%   -7.95%  (p=0.000 n=10+8)
AppendUint-4                              478ns ± 1%   418ns ± 2%  -12.61%  (p=0.000 n=8+10)
AppendUintVarlen/1-4                     9.30ns ± 6%  9.15ns ± 1%     ~     (p=0.199 n=9+10)
AppendUintVarlen/12-4                    9.12ns ± 0%  9.16ns ± 2%     ~     (p=0.307 n=9+9)
AppendUintVarlen/123-4                   18.6ns ± 2%  18.7ns ± 0%     ~     (p=0.091 n=10+6)
AppendUintVarlen/1234-4                  19.1ns ± 4%  17.7ns ± 1%   -7.35%  (p=0.000 n=10+9)
AppendUintVarlen/12345-4                 21.5ns ± 3%  20.7ns ± 3%   -3.78%  (p=0.002 n=9+10)
AppendUintVarlen/123456-4                23.5ns ± 3%  20.9ns ± 1%  -11.14%  (p=0.000 n=10+9)
AppendUintVarlen/1234567-4               25.0ns ± 2%  23.6ns ± 7%   -5.48%  (p=0.004 n=9+10)
AppendUintVarlen/12345678-4              26.8ns ± 2%  23.4ns ± 2%  -12.79%  (p=0.000 n=9+10)
AppendUintVarlen/123456789-4             29.8ns ± 3%  26.5ns ± 5%  -11.03%  (p=0.000 n=10+10)
AppendUintVarlen/1234567890-4            31.6ns ± 3%  26.9ns ± 3%  -14.95%  (p=0.000 n=10+9)
AppendUintVarlen/12345678901-4           33.8ns ± 3%  29.3ns ± 5%  -13.21%  (p=0.000 n=10+10)
AppendUintVarlen/123456789012-4          35.5ns ± 4%  29.2ns ± 4%  -17.82%  (p=0.000 n=10+10)
AppendUintVarlen/1234567890123-4         37.6ns ± 4%  31.4ns ± 3%  -16.48%  (p=0.000 n=10+10)
AppendUintVarlen/12345678901234-4        39.8ns ± 6%  32.0ns ± 7%  -19.60%  (p=0.000 n=10+10)
AppendUintVarlen/123456789012345-4       40.7ns ± 0%  34.4ns ± 4%  -15.55%  (p=0.000 n=6+10)
AppendUintVarlen/1234567890123456-4      45.4ns ± 6%  35.1ns ± 4%  -22.66%  (p=0.000 n=10+10)
AppendUintVarlen/12345678901234567-4     45.1ns ± 1%  36.7ns ± 4%  -18.77%  (p=0.000 n=9+10)
AppendUintVarlen/123456789012345678-4    46.9ns ± 0%  36.4ns ± 3%  -22.49%  (p=0.000 n=9+10)
AppendUintVarlen/1234567890123456789-4   50.6ns ± 6%  38.8ns ± 3%  -23.28%  (p=0.000 n=10+10)
AppendUintVarlen/12345678901234567890-4  51.3ns ± 2%  38.4ns ± 0%  -25.00%  (p=0.000 n=9+8)

Benchmark results for GOARCH=386:

name                                     old time/op  new time/op  delta
FormatInt-4                              6.21µs ± 0%  6.14µs ± 0%  -1.11%  (p=0.008 n=5+5)
AppendInt-4                              4.95µs ± 0%  4.85µs ± 0%  -1.99%  (p=0.016 n=5+4)
FormatUint-4                             1.89µs ± 1%  1.83µs ± 1%  -2.94%  (p=0.008 n=5+5)
AppendUint-4                             1.59µs ± 0%  1.57µs ± 2%  -1.72%  (p=0.040 n=5+5)
FormatIntSmall-4                         8.48ns ± 0%  8.48ns ± 0%    ~     (p=0.905 n=5+5)
AppendIntSmall-4                         12.2ns ± 0%  12.2ns ± 0%    ~     (all equal)
AppendUintVarlen/1-4                     10.6ns ± 1%  10.7ns ± 0%    ~     (p=0.238 n=5+4)
AppendUintVarlen/12-4                    10.7ns ± 0%  10.7ns ± 1%    ~     (p=0.333 n=4+5)
AppendUintVarlen/123-4                   29.9ns ± 1%  30.2ns ± 0%  +1.07%  (p=0.016 n=5+4)
AppendUintVarlen/1234-4                  32.4ns ± 1%  30.4ns ± 0%  -6.30%  (p=0.008 n=5+5)
AppendUintVarlen/12345-4                 35.1ns ± 2%  34.9ns ± 0%    ~     (p=0.238 n=5+5)
AppendUintVarlen/123456-4                36.6ns ± 0%  35.3ns ± 0%  -3.55%  (p=0.029 n=4+4)
AppendUintVarlen/1234567-4               38.9ns ± 0%  39.6ns ± 0%  +1.80%  (p=0.029 n=4+4)
AppendUintVarlen/12345678-4              41.3ns ± 0%  40.1ns ± 0%  -2.91%  (p=0.000 n=5+4)
AppendUintVarlen/123456789-4             44.9ns ± 1%  44.8ns ± 0%    ~     (p=0.667 n=5+5)
AppendUintVarlen/1234567890-4            65.6ns ± 0%  66.2ns ± 1%  +0.88%  (p=0.016 n=4+5)
AppendUintVarlen/12345678901-4           77.9ns ± 0%  76.3ns ± 0%  -2.00%  (p=0.000 n=4+5)
AppendUintVarlen/123456789012-4          80.7ns ± 0%  79.1ns ± 1%  -2.01%  (p=0.008 n=5+5)
AppendUintVarlen/1234567890123-4         83.6ns ± 0%  80.2ns ± 1%  -4.07%  (p=0.008 n=5+5)
AppendUintVarlen/12345678901234-4        86.2ns ± 1%  83.3ns ± 0%  -3.39%  (p=0.008 n=5+5)
AppendUintVarlen/123456789012345-4       88.5ns ± 0%  83.7ns ± 0%  -5.42%  (p=0.008 n=5+5)
AppendUintVarlen/1234567890123456-4      90.6ns ± 0%  88.3ns ± 0%  -2.54%  (p=0.008 n=5+5)
AppendUintVarlen/12345678901234567-4     92.7ns ± 0%  89.0ns ± 1%  -4.01%  (p=0.008 n=5+5)
AppendUintVarlen/123456789012345678-4    95.6ns ± 1%  92.6ns ± 0%  -3.18%  (p=0.016 n=5+4)
AppendUintVarlen/1234567890123456789-4    118ns ± 0%   114ns ± 0%    ~     (p=0.079 n=4+5)
AppendUintVarlen/12345678901234567890-4   138ns ± 0%   136ns ± 0%  -1.45%  (p=0.008 n=5+5)

Updates #19445

Change-Id: Iafbe5c074898187c150dc3854e5b9fc19c10be05
Reviewed-on: https://go-review.googlesource.com/38255
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
@golang golang locked and limited conversation to collaborators Mar 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants