In this article I investigate the computational performance of various string concatenation methods in the Go programming language.
To evaluate the options, I prepared some typical Go benchmarks using the Go testing
package. A benchmark looks something like this:
func BenchmarkBufferString(b *testing.B, numConcat int) {
var ns string
for i := 0; i < b.N; i++ {
next := nextString()
buffer := bytes.NewBufferString("")
for u := 0; u < numConcat; u++ {
buffer.WriteString(next())
}
ns = buffer.String()
}
global = ns
}
For the purposes of these benchmarks, I imagined having a process that returns string segments one by one, and these segments need to be concatenated to form one string. To represent such a process, I created a simple iterator that returns the string version of its current index on each call:
// nextString is an iterator we use to represent a process
// that returns strings that we want to concatenate in order.
func nextString() func() string {
n := 0
// closure captures variable n
return func() string {
n += 1
return strconv.Itoa(n)
}
}
Every method we evaluate will call this function in a loop so that the fully concatenated string looks something like 0123456789101112131415...
. There is function call overhead to take into account in this, but since all the benchmarks will have the same overhead, the comparison is still fair. Now that we have the basic setup for the benchmark, let’s see what the methods are we’ll be evaluating.
Method 1: Naive Appending with +=
This is the most obvious approach to the problem. Use the concatenate operator (+=) to append each segment to the string. The thing to know about this is that strings are immutable in Go - each time a string is assigned to a variable a new address is assigned in memory to represent the new value. This is different from languages like C++ and BASIC, where a string variable can be modified in place. In Go, each time you append to the end of a string, it must create a new string and copy the contents of both the existing string and the appended string into it. As the strings you are manipulating become large this process becomes increasingly slow. In fact, this method of naive appending is O(N2).
Method 2: strings.Join
The Go strings
standard library package provides a Join
function that takes a slice of strings and concatenates them. In our case, this requires first building a slice of strings and then calling the function on the slice. Building a slice takes time if you don’t already have one, and we therefore count this as part of the time taken to perform the concatenation in our benchmarks. The final concatenation will be O(N), but building the slice is O(N) (average) and uses O(N) memory.
Here is an example of using strings.Join
:
s := []string{}
for i := 0; i < 1000; i++ {
s = append(s, "a")
}
fmt.Println(strings.Join(s, ""))
See full code on the Go Playground
Method 3: bytes.Buffer
The last method we’ll be evaluating is using the Buffer type from the bytes package. bytes.Buffer
implements io.Writer, and can concatenate strings in O(N) time without requiring us to build a new slice.
Here is an example of using bytes.Buffer
:
var buffer bytes.Buffer
for i := 0; i < 1000; i++ {
buffer.WriteString("a")
}
fmt.Println(buffer.String())
See full code on the Go Playground
Now that we’ve defined the methods we want to test, let’s proceed to run the benchmarks. I’d like to skip right to the end and start with the winning approach. Then we’ll work back to see how and when this method is best.
And the Winner Is…
Method #3 (bytes.Buffer
) is the winner! But not by far, and not in all cases. Here is a graph of how the methods compare in the case with a 10000 strings with a total size of 37kB (small strings of <= 5 characters):
It shows how bytes.Buffer
is the winner by quite a large margin in this case. Let’s see how the methods compare when the strings are a bit longer, at a total size of 244kB (25 characters each):
With longer strings strings.Join
doesn’t perform so badly, but it’s still not as good as using bytes.Buffer
.
I also ran benchmarks on small numbers of concatenations, and in that case naive concatenation gets the job done slightly faster than the other methods, so it’s still fine to use for small numbers of concatenations and small segments.
Lastly, how big of a factor does building a string slice play in Method #2? We can exclude slice building from the strings.Join
benchmark, but make it still call nextString
on every iteration so that the tests remain fair. In this case, the results look like this:
When we don’t count the time it takes to build a string slice, strings.Join
edges out bytes.Buffer
and rules supreme.
In Summary
In summary, for very few concatenations of short strings (fewer than hundred, length shorter than 10), using Method #1, naive string appending, is just fine and can actually perform slightly better. For more heavy-duty cases where efficiency is important, Method #3, bytes.Buffer
is the best choice out of the methods evaluated. That said, strings.Join
is a good choice when you already have a string slice that just needs to be concatenated into one string.
Get the Code
Below are the raw results of running go test -bench.
for the different methods and different number of concatenations on my machine (Go 1.2 on Ubuntu 64bit, Intel® Core™ i7-3667U CPU @ 2.00GHz):
BenchmarkNaiveConcat10 500000 6403 ns/op
BenchmarkNaiveConcat100 20000 77001 ns/op
BenchmarkNaiveConcat1000 500 3105361 ns/op
BenchmarkNaiveConcat10000 5 236872134 ns/op
BenchmarkJoin10 200000 8129 ns/op
BenchmarkJoin100 50000 51891 ns/op
BenchmarkJoin1000 5000 465179 ns/op
BenchmarkJoin10000 500 5911317 ns/op
BenchmarkBufferString10 200000 7688 ns/op
BenchmarkBufferString100 50000 45027 ns/op
BenchmarkBufferString1000 5000 398311 ns/op
BenchmarkBufferString10000 500 3702906 ns/op
You can play around with the benchmarks, run them yourself and check my implementation on Github. A typical benchmark looks like this:
var global string
// benchmarkNaiveConcat provides a benchmark for basic built-in
// Go string concatenation. Because strings are immutable in Go,
// it performs the worst of the tested methods. The time taken to
// set up the array that is appended is not counted towards the
// time for naive concatenation.
func benchmarkNaiveConcat(b *testing.B, numConcat int) {
var ns string
for i := 0; i < b.N; i++ {
next := nextString()
ns = ""
for u := 0; u < numConcat; u++ {
ns += next()
}
}
// we assign to a global variable to make sure compiler
// or runtime optimizations don't skip over the operations
// we were benchmarking. This might be unnecessary, but it's
// safer.
global = ns
}
I’m a software engineer working on NLP and Machine Learning applications in Go. I like writing about Go, software engineering and languages. If you liked this article, feel free to reach out on Twitter; over there I’m known as @ironzeb. I’m currently working on my first book, Production Go, which contains lots more insight into writing effective Go.