A Short Explanation of Generators With Go Channels
A generator in computer science is a function, sometimes called a subroutine that produces values to iterate over. Those values can be the result of a computation. The iteration ends either when the generator stops producing values or when the caller terminates it explicitly (with the break
keyword for instance).
With Javascript, we can declare a generator with the function
keyword followed by an asterisk.
// main.js
function* gen() {
yield 1;
yield 2;
yield 3;
yield 4;
yield 5;
}
function main() {
g = gen();
// We can use next() to get the next value,
// it returns an object with the following shape:
// interface IteratorResult<T> {
// done: boolean;
// value: T;
// }
console.log(g.next().value);
console.log(g.next().value);
console.log(g.next().value);
console.log(g.next().value);
console.log(g.next().value);
}
main();
We’ll get the following output.
$ node main.js
1
2
3
4
5
A generator is an iterator; therefore, we can use a for...of
loop to get the same output.
function main() {
for (const i of gen()) {
console.log(i);
}
}
Go does not have generator functions as a language construct. It is nonetheless possible to mimic a similar behaviour with a channel and a goroutine. Going back to the Javascript example, we can write a regular function that returns a channel to send integers over and get the same result.
// main.go
package main
import "fmt"
func gen() <-chan int {
c := make(chan int)
go func() {
c <- 1
c <- 2
c <- 3
c <- 4
c <- 5
close(c)
}()
return c
}
func main() {
g := gen()
fmt.Println(<-g)
fmt.Println(<-g)
fmt.Println(<-g)
fmt.Println(<-g)
fmt.Println(<-g)
}
Running this code, we get:
$ go run main.go
1
2
3
4
5
The same way, we can iterate over the channel with range
.
func main() {
for i := range gen() {
fmt.Println(i)
}
}
Ok that’s cool but isn’t it simpler to return an array?
When a function returns an array containing all the values, those values are all stored in memory; a generator, on the other hand, returns the values one at the time which is much more efficient in terms of memory allocation. So in cases where we want to avoid filling up the memory, generators are much better suited. A generator also gives us much more control over the iterator’s behaviour.
But let’s see how we can use a generator with a concrete example. Imagine we want to process some data stored in CSV files. To look up the files, we can write a function that reads recursively a directory, finds the files, and returns a slice containing the lines.
func walk(dir string) []*line {
files, err := ioutil.ReadDir(dir)
if err != nil {
panic(err)
}
var lines []*line
for _, f := range files {
path := fmt.Sprintf("%s/%s", dir, f.Name())
if f.IsDir() {
r := walk(path) // recursively read the directories
lines = append(lines, r...)
} else if strings.HasSuffix(f.Name(), ".csv") {
r := getLines(path) // get the file's lines
lines = append(lines, r...)
}
}
return lines
}
The getLines
function opens the file and returns its content.
func getLines(path string) []*line {
f, err := os.Open(path)
if err != nil {
panic(err)
}
defer f.Close()
var lines []*line
reader := csv.NewReader(f)
for {
r, err := reader.Read()
if err == io.EOF {
break
}
if err != nil {
panic(err)
}
// Transform and append the line to the slice
lines = append(lines, transform(r))
}
return lines
}
To keep it simple, we define a struct line
to represent a line in a file; the transform
function merely turns the slice of strings returned by the CSV reader into a struct.
type line struct {
col1 string
col2 string
col3 string
}
func transform(l []string) *line {
return &line{
col1: l[0],
col2: l[1],
col3: l[2],
}
}
We can then use our walk
function to get all the lines.
func doSomethingWithTheLine(l *line) error {
fmt.Println(l)
return nil
}
func main() {
// Load the lines into a slice and loop through it
for _, line := range walk("./path/to/csv/files") {
if err := doSomethingWithTheLine(line); err != nil {
panic(err)
}
}
}
This code works, however, since the function loads all the lines into memory, the execution can be quite inefficient and take a lot of memory. Depending on the use case, waiting for the function to return in order to start processing the values can also be awkward.
Knowing how to create a generator and leverage go channels, we have a nice opportunity to modify the walk
function and optimise the code execution. Instead of returning a slice of lines, we can use a channel, read the directory within a goroutine
, and send the lines over the channel.
func walk(dir string) <-chan *line {
// Create a channel to send the lines over
out := make(chan *line)
go func() {
files, err := ioutil.ReadDir(dir)
if err != nil {
panic(err)
}
for _, f := range files {
path := fmt.Sprintf("%s/%s", dir, f.Name())
if f.IsDir() {
// walk through the child directory
// and send the lines over the parent channel
for p := range walk(path) {
out <- p
}
} else if strings.HasSuffix(f.Name(), ".csv") {
// read the file's content
for _, r := range getLines(path) {
// send the lines over the output channel
out <- r
}
}
}
// all the files have been read
// we can close the channel
close(out)
}()
return out
}
range
on channels does not provide the index so we need to slightly change the main
function.
func main() {
for line := range walk("./path/to/csv/files") {
if err := doSomethingWithTheLine(line); err != nil {
panic(err)
}
}
}
With this change, the function spits the lines out as soon as they are discovered and they can be processed immediately; the memory allocation should also remain at a sensible level and the program can be run safely.
However, there are subtle differences to bear in mind. A built-in generator computes its yielded value on demand whereas Go will send the next value over a channel as soon as the previous value has been received; the computation timing is slightly different. Furthermore, Go channels can be created with a buffer size; at runtime, values are sent over the channel until the buffer is full, causing those values to be kept in memory in such a situation.
In any case, using channels to generate data is a powerful pattern to help us control the code execution.