The Tale of GoIDs

2.16 And the LORD God commanded the man, saying: ‘Of every tree of the garden thou mayest freely eat;

2.17 but of the tree of the knowledge of good and evil, thou shalt not eat of it; for in the day that thou eatest thereof thou shalt surely die.’

The Book of Genesis

Go gives the programmer introspection into every aspect of the language, and of a running program. But to one thing the programmer does not have access, and it is the goroutine identifier. Because the day the programmers know the goroutine identifier, they create goroutine-local storage through shared access and mutexes, and shall surely die.

However, there are use cases beyond concurrent handling of HTTP requests, in which sharing memory by communicating through channels or passing the context around is not going to work. One such case is Infergo. Infergo transforms Go source code to enable reverse-mode automatic differentiation. Function signatures stay unchanged, but function bodies are modified to write to a so-called tape a trace of every floating point operation. A single tape must be accessible by all functions. If derivatives are computed concurrently in multiple goroutines, every goroutine must have its own tape.

The functions must know how to get to the tape. And getting to the tape must be very efficient: every floating point operation involves an access to the tape!

It is not that no one thought about goroutine identifiers before. There are Go programmers who need the identifiers, some of them admit the need, and a few find workarounds to actually obtain the identifiers. I searched for the workarounds, I found several worthy attempts, and then I had a revelation, and then I discovered someone who had the same revelation before me. And that gave Infergo efficient goroutine-local storage for concurrent automatic differentiation and inference. Here is how it went.

Worthy Attempts

From the makers of Go

Brad Fitzpatrick is a member of the Go programming language team at Google. Brad wrote a function which obtains the goroutine identifier. The function creates a stack trace and parses the identifier out of string representation of the trace. Brad needed this for debugging, “to track that functions run on the goroutine that they’re supposed to”. The function uses public API calls and an undocumented but stable format of the serialized stack trace.

Somewhat cumbersome but working. Unfortunately, too inefficient for Infergo use case. Collecting, serializing, and parsing the stack trace on every operation makes automatic differentiation 2,000 (two thousand) times slower!

From the users of Go

JT Olio wrote a goroutine-local storage library. The library “defines 16 special functions and embed base-16 tags into the stack using the call order of those 16 functions.” Then, this embedding is used to produce an unique goroutine identifier and establish goroutine-local storage. The idea blew me away! However, the library requires that Go routines are created through a library call. I could modify Infergo’s own inference algorithms, however I would not be able to pass functions and gradients to third-party code. Infergo integrates with Gonum optimization nicely, and by enabling goroutine-local tapes I strived to improve this integration, rather than sacrifice it.

Revelation

I was almost ready to give up, that is, to write code that adds an extra ‘context’ parameter to every differentiated function. But then it came down onto me that maybe Go does not want to prevent me from using the goroutine identifier. Maybe it is there, and I just do not see it.

Indeed, Go has an assembly language. The language is documented, Go functions can be implemented in Go assembly. If I wanted a system feature not available through a library, I would write an assembly function bringing that feature to me.

The same goes for Go.

Not only Go has an assembler, the assembler has a dedicated register g pointing at runtime.g, the goroutine descriptor. goid, the Go routine identifier is just one of the fields of the descriptor. I can just use the contents of g to get the goroutine identifier, and it will only be a couple instructions!

It is much easier to find something if you know what you are looking for. Tao Wen wrote yet another GLS library; and this library does exactly what I just described: uses Go assembler to access register g, and retrieves field goid from the structure pointed to by the register. I somewhat simplified the code, added support for all platforms where Go is available, and now Infergo has fast and straightforward support for multithreading.

Lessons Learned

It is sometimes easier to find a hole in the fence than to jump over.
You can do anything with Go. You just must prove (to yourself more than to others) that you are brave and skilled enough. For example, by diving into Go internals and coding in Go assembly.
My Samsung Tab S4 tablet is amazingly well fit for multithreading, in Go in particular. I did most of the development on the tablet, in Termux. The tablet’s CPU has 8 cores, and Go runs multiple goroutines in parallel with very little overhead: 8 inference threads in parallel take roughly the same time as a single thread with local goroutine storage, and only 20% slower than a single thread with a global tape, for the same amount of computation per thread.
You can run multiple goroutines in parallel in a browser via WebAssembly. WebAssembly is slower than other targets, but still quite fast.