Go Nil Slice

var names []string

for _, person := range bus.People() {
	names = append(names, person.name)
}

This probably looks pretty familiar. If it doesn’t and you are like “OMG, how is that working?!?! Doesn’t that nil panic?!”, no worries, append will instantiate the slice for you. However you are leaving it up to Golang to guess how many items will end up in the final slice, and it has 0 context. Let’s dive into what a slice is under the hood.

Slice Definition

This is a topic a plethora of bloggers have gone through, so I will keep this as terse as possible. If what I say doesn’t make sense jump over to the go blog to get a better description.

type slice struct {
	array unsafe.Pointer
	len int
	cap int
}

Above is the definition of a slice, we have a pointer array, in memory, to the beginning of the array; AKA the actual data that we think of when we think of a slice. Then we have the len (length) of the slice. Note: that is the length of the slice not the array this distinction is important (you will see later). Finally you have the cap which is the length of the underlying array. The Gunthers of the world might feel this is wasteful, “Give me the unsafe pointer, index overflow is a skill issue.” Well Gunther, the Whitehouse has something to say about that. (actual paper)

Anyway, when appending to an existing slice go checks the cap to see if adding another value would write outside the bounds of the slice. If len++ > cap we have to allocate a new slice, copy the data over to that new slice and then add our new value. However, if there is room go can just slap that bad boy (or girl (all genders can be bad (I am digging a deeper whole aren’t I?))) onto the end of the array and increase the value of len.

When creating a slice you have the option of supplying a cap, which sets the initial length of the underlying array.

//       make(type,     len, cap)
names := make([]string, 0,   100)

Now you will have a slice which can be appended to 100 times before having to create a new underlying array.

That’s Neat, Who Cares

OK, yeah, onto the point. That whole copy the data into a new slice thing; Yeah it can be expensive. Often slices get tossed over to the heap adding to the performance hit. Who cares what I say, lets see what benchmarks say:

❯ go test . -test.run='^$' -test.bench='^Benchmark(Nil|NonEmpty)Slice'
goos: linux
goarch: amd64
pkg: github.com/d1ngd0/blog_work
cpu: AMD FX(tm)-8320 Eight-Core Processor
BenchmarkNonEmptySlice1000-8   	   31526	     38780 ns/op	   41088 B/op	       4 allocs/op
BenchmarkNonEmptySlice500-8    	   56721	     20679 ns/op	   22656 B/op	       3 allocs/op
BenchmarkNonEmptySlice250-8    	  107714	      9528 ns/op	   10368 B/op	       2 allocs/op
BenchmarkNonEmptySlice100-8    	 5320218	       235.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkNonEmptySlice47-8     	 8044794	       141.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkNonEmptySlice10-8     	18386499	        65.56 ns/op	       0 B/op	       0 allocs/op
BenchmarkNonEmptySlice1-8      	24260070	        51.28 ns/op	       0 B/op	       0 allocs/op
BenchmarkNilSlice1000-8        	   35143	     35947 ns/op	   35184 B/op	      11 allocs/op
BenchmarkNilSlice500-8         	   61804	     19340 ns/op	   18800 B/op	      10 allocs/op
BenchmarkNilSlice250-8         	  107950	     10615 ns/op	    9328 B/op	       9 allocs/op
BenchmarkNilSlice100-8         	  196062	      5543 ns/op	    4464 B/op	       8 allocs/op
BenchmarkNilSlice47-8          	  368827	      3275 ns/op	    2160 B/op	       7 allocs/op
BenchmarkNilSlice10-8          	 1000000	      1164 ns/op	     496 B/op	       5 allocs/op
BenchmarkNilSlice1-8           	10545158	       154.0 ns/op	      16 B/op	       1 allocs/op
PASS
ok  	github.com/d1ngd0/blog_work	22.383s

You can see how these benchmarks implemented our initial example at my GitHub. In short we have NonEmpty slices created with an underlying array with a cap of 100. The Nil slices have no initial definition. You can see early on allocating a decent guess performs better than no guess. When your guess is exactly right (100) you see big performance gains 235ns < 5543ns. However as your guess drifts further away from correct Golang’s underlying algorithm starts outperforming your bad, frankly Gunther like guess of the size.

To confirm this hypothesis I adjusted the initial cap to 1000 and you can see the pre allocated array outperforming nil again.

❯ go test . -test.run='^$' -test.bench='^Benchmark(Nil|NonEmptyAdjusted)Slice'
goos: linux
goarch: amd64
pkg: github.com/d1ngd0/blog_work
cpu: AMD FX(tm)-8320 Eight-Core Processor
BenchmarkNilSlice1000-8                	   33852	     36093 ns/op	   35184 B/op	      11 allocs/op
BenchmarkNilSlice500-8                 	   59600	     19554 ns/op	   18800 B/op	      10 allocs/op
BenchmarkNilSlice250-8                 	  112942	     10621 ns/op	    9328 B/op	       9 allocs/op
BenchmarkNilSlice100-8                 	  195759	      5687 ns/op	    4464 B/op	       8 allocs/op
BenchmarkNilSlice47-8                  	  403456	      3311 ns/op	    2160 B/op	       7 allocs/op
BenchmarkNilSlice10-8                  	 1019366	      1200 ns/op	     496 B/op	       5 allocs/op
BenchmarkNilSlice1-8                   	 7731250	       155.2 ns/op	      16 B/op	       1 allocs/op
BenchmarkNonEmptyAdjustedSlice1000-8   	  447943	      2833 ns/op	       0 B/op	       0 allocs/op
BenchmarkNonEmptyAdjustedSlice500-8    	  574732	      1809 ns/op	       0 B/op	       0 allocs/op
BenchmarkNonEmptyAdjustedSlice250-8    	  943290	      1159 ns/op	       0 B/op	       0 allocs/op
BenchmarkNonEmptyAdjustedSlice100-8    	 1242712	       953.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkNonEmptyAdjustedSlice47-8     	 1356798	       880.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkNonEmptyAdjustedSlice10-8     	 1583610	       800.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkNonEmptyAdjustedSlice1-8      	 1453552	       748.2 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/d1ngd0/blog_work	22.719s

How the Slice Grows

It was quite interesting that the nil slice began outperforming the “bad” guess at a reasonable cap. I decided to look into the function that grows the underlying array to see if any hints could point out the reason.

The function that does the magic. I am going to add a bunch of comments on top of what is already there. If you want the original it is on GitHub.

// nextslicecap computes the next appropriate array length.
func nextslicecap(newLen, oldCap int) int {
	// start by assuming the cap is big enough, test
	// later to see if that is actually true
	newcap := oldCap
	// calculate what double the cap would be
	doublecap := newcap + newcap

	// if the new length is greater than double the cap
	// we should just return that value. This is what would
	// happen if you ran something like `append([]int{1}, []int{1,2,3})`
	// nextslicecap would be called with (newLen = 4, oldCap = 1) so this
	// function would return 4.
	if newLen > doublecap {
		return newLen
	}

	// if our oldCap is less than 256 we just want to return double. This means
	// in our previous example (starting at 100), we would return 100, 200, 400
	// and then fall into the for loop instead of returning double. We would
	// calculate a cap of 400 because we are checking the oldCap in the if
	// statement not the doublecap.
	const threshold = 256
	if oldCap < threshold {
		// just keep doub
		return doublecap
	}

	// now we switch t a 1.25x growth for anything with an oldCap greater than
	// 256.
	for {
		// Transition from growing 2x for small slices
		// to growing 1.25x for large slices. This formula
		// gives a smooth-ish transition between the two.
		newcap += (newcap + 3*threshold) >> 2

		// We need to check `newcap >= newLen` and whether `newcap` overflowed.
		// newLen is guaranteed to be larger than zero, hence
		// when newcap overflows then `uint(newcap) > uint(newLen)`.
		// This allows to check for both with the same comparison.
		if uint(newcap) >= uint(newLen) {
			break
		}
		// Paul Here: For some more context when they say overflow, they
		// mean integer overflow. Say you have an array that is nearly larger than
		// an int32 on a 32 bit machine. When you multiply that by 1.25 it could
		// overflow.
		// because newcap and old cap are ints, when you cast them into uints
		// they will maintain the same value if they are positive. However if
		// they are negative they will get a bigger number. One that isn't a valid
		// int, but the condition will work.
		// https://go.dev/play/p/c6xjvE-sFlb
	}

	// Set newcap to the requested cap when
	// the newcap calculation overflowed.
	if newcap <= 0 {
		return newLen
	}

	return newcap
}

Knowing all this, the subtle differences in our maths and doubling likely lead to a difference in performance over time. I’m sure someone could write their thesis on what is going on here, but I am trying to get this blog done in a single morning. :)

Conclusion

Does any of this matter? Not really. I usually prefer to create nil slices since they look prettier. Performance isn’t a problem until it’s a problem. However, if you have a good guess on size within a hot path, it is a decent step to take to potentially speed things up. If you take anything away from this blog post it should be even when you understand how something works, subtleties can cause big swings in performance. So benchmark your particular thing and follow the numbers.

Slice Definition#

That’s Neat, Who Cares#

How the Slice Grows#

Conclusion#

Slice Definition

That’s Neat, Who Cares

How the Slice Grows

Conclusion