Go strings

Strings in Go are immutable.

Under the hood string is a sequence of bytes.

Indexing

You can index strings:

var s string = "Hello there"
var b byte = s[6]

Slice expression notation works on strings:

var s string = "Hello there"
var s2 string = s[4:7]
var s3 string = s[:5]
var s4 string = s[6:]

There’s a catch though when dealing languages or emojis where a single character can take multiple bytes in UTF-8:

var s string = "Hello 🌞"
var s2 string = s[4:7] // will truncate the emoji, as only the first char will be copied
var s3 string = s[:5]
var s4 string = s[6:]
 
var s string = "Hello 🌞"
fmt.Println(len(s)) // Will print out 10 instead of 7

That’s why it’s wise to only index the string when you are sure that there are only characters that take up one byte.

Rather than use the slice and index expressions with strings, you should extract substrings and code points from strings using the functions in the strings and unicode/utf8 packages in the standard library. In the next chapter, you’ll see how to use a for-range loop to iterate over the code points in a string.

Conversion

rune or byte can be converted to a string:

var a rune    = 'x'
var s string  = string(a)
var b byte    = 'y'
var s2 string = string(b)

But trying to convert int to string will return a rune character with that code, instead of the "<int_value">:

var x int = 65
var y = string(x)
fmt.Println(y) // will print "A" instead of "65"

go vet will catch this and will suggest that any integer type can be converted only to a rune or byte, but not string.

String can be converted to slice of bytes or runes:

var s string = "Hello, 🌞"
var bs []byte = []byte(s)
var rs []rune = []rune(s)
fmt.Println(bs) // [72 101 108 108 111 44 32 240 159 140 158]
fmt.Println(rs) // [72 101 108 108 111 44 32 127774]

Encoding

Go doesn’t require a string to be written in UTF-8 but strongly encourages it.

Under the covers, Go uses a sequence of bytes to represent a string. These bytes don’t have to be in any particular character encoding, but several Go library functions (and the for-range loop that I discuss in the next chapter) assume that a string is composed of a sequence of UTF-8-encoded code points.

According to the language specification, Go source code is always written in UTF-8. Unless you use hexadecimal escapes in a string literal, your string literals are written in UTF-8.

Artem Udovyk

Explorer

Go strings

Indexing

Conversion

Encoding

Explorer

Graph View

Table of Contents

Backlinks