small medium large xlarge

The Beauty of Concurrency in Go

Beyond "Hello World"

by Alexander Demin

Generic image illustrating the article
  It's good to learn a new language every so often, but you have to get beyond "hello World."  

It's good for you to learn a new programming language from time to time. This is true even if the language doesn't take off or is ancient. Tackling old problems in a new language pushes you to rethink your current views, approaches, and habits.

I love trying new stuff, especially programming languages. But after implementing "Hello, world!" or the Fibonacci sequence in a new language, you usually feel almost nothing, get no taste whatsoever. You could try implementing the Sieve of Eratosthenes to explore, a little, data structures and maybe performance. But I wanted something real, something I could maybe even reuse afterwards. So some time ago I invented for myself a problem that helps me to get the feel of a language in just a few hundred lines of code.

The problem involves several very important elements of a language: strings, file and network I/O, and, of course, concurrency. The problem is called TCP/IP proxy (or you could call it a network debugger). The idea is, you have a TCP/IP listener (single- or multi-threaded) accepting a connection on a given port. When it receives an incoming connection, it has to connect to another host and pass through the data in both directions between the caller and the remote host. Additionally, the proxy can log the traffic in various formats to help in analyzing the data.

I stopped counting the occasions when I needed this kind of tool. Any time network programming is involved, such a tool is essential. I have implemented it many times in my life in different languages: C, C++, Perl, PHP. The two latest implementations were in Python and Erlang. It represents the kind of real problem I was looking for.

We can specify more concrete requirements. The application must serve multiple connections simultaneously. For each connection, it needs to log data in three ways: a dump presenting data sequentially in the both directions in the form of a hexadecimal dump, and two binary logs with incoming and outgoing data streams in separate binary files.

We're going to implement the program in this article, and the language we're going to use is Go. The Go authors claim that they designed the language with concurrency and multi-threading in its blood stream. I intend to take them at their word.

If I developed such an application in boostified C++, I would probably go for the main listener thread, plus threads for each connection. Hence, an individual connection would be fully served (I/O and logging) by a single thread.

Here are the threads I'll use to serve each connection in the Go implementation:

  • a bi-directional hex dumper thread

  • two threads logging incoming and outgoing streams in the binary form

  • two threads passing through data from the local host to the remote and vice versa

In total: 5 threads.

Again, five threads are serving each individual connection. I implemented all these threads not for the sake of multi-threading per se, but because Go encourages multi-threading, while C++ discourages it (even with the new C++x11 standard's steroids). Multi-threading in Go is natural and simple. My implementation of the TCP/IP proxy in Go doesn't use mutexes and conditional variables. Synchronization is elegantly managed by Go's channels.

Okay, here's the source, with explanations. If you are not familiar with Go, the commentary should help. My intention was to focus not just on the functionality of the program, but also on the Go language itself.

Let's Go

In lines 2-11 we declare the packages we are going to use. Notably, if a package is included but not used, Go treats this as an error and enforces removing unused declarations (remember when you gave up last time and didn't bother to clean up the list STL includes in your C++ project?)

  1 package main
  2 import (
  3 "encoding/hex"
  4 "flag"
  5 "fmt"
  6 "net"
  7 "os"
  8 "runtime"
  9 "strings"
 10 "time"
 11 )

In lines 12-16 we declare global variables presenting the command line flags. Further down we will see how to parse them.

 12 var (
 13 host *string = flag.String("host", "",
  "target host or address")
 14 port *string = flag.String("port", "0", "target port")
 15 listen_port *string = flag.String("listen_port", "0",
  "listen port")
 16 )

In lines 17-20 we see the syntax of variadic function arguments in Go.

 17 func die(format string, v ...interface{}) {
 18 os.Stderr.WriteString(fmt.Sprintf(format+"\n", v...))
 19 os.Exit(1)
 20 }

In lines 21-28 there are two functions launching the hex dump and the binary loggers. The only difference is in the log name.

 21 func connection_logger(data chan []byte, conn_n int,
  local_info, remote_info string) {
 22 log_name := fmt.Sprintf("log-%s-%04d-%s-%s.log",
  format_time(time.Now()), conn_n, local_info, remote_info)
 23 logger_loop(data, log_name)
 24 }
 25 func binary_logger(data chan []byte, conn_n int, peer string) {
 26 log_name := fmt.Sprintf("log-binary-%s-%04d-%s.log",
  format_time(time.Now()), conn_n, peer)
 27 logger_loop(data, log_name)
 28 }

In lines 29-43 the real Go fun begins. The function logger_loop creates a log file and then begins spinning in the infinite loop (lines 35-42). In line 36 the code waits for a message from the channel data. There is an interesting trick in line 34. The operator defer allows us to define a block of code guaranteed be executed at the end of the function scope (similar to finally in Java). If empty data is received, the function exits.

 29 func logger_loop(data chan []byte, log_name string) {
 30 f, err := os.Create(log_name)
 31 if err != nil {
 32 die("Unable to create file %s, %v\n", log_name, err)
 33 }
 34 defer f.Close()
 35 for {
 36 b := <-data
 37 if len(b) == 0 {
 38 break
 39 }
 40 f.Write(b)
 41 f.Sync()
 42 }
 43 }
 44 func format_time(t time.Time) string {
 45 return t.Format("2006.01.02-15.04.05")
 46 }
 47 func printable_addr(a net.Addr) string {
 48 return strings.Replace(a.String(), ":", "-", -1)
 49 }
 50 type Channel struct {
 51 from, to net.Conn
 52 logger, binary_logger chan []byte
 53 ack chan bool
 54 }

In lines 55-88 there is a function that reads data from the source socket from, writes to the log, and sends it to the destination socket to. For each connection there are two instances of the pass_through function copying data between the local and remote sockets in opposite directions. When an I/O error occurs, it is treated as a disconnect. Finally, in line 79 this function sends the acknowledgment back to the main thread, signalling its termination.

 55 func pass_through(c *Channel) {
 56 from_peer := printable_addr(c.from.LocalAddr())
 57 to_peer := printable_addr(
 58 b := make([]byte, 10240)
 59 offset := 0
 60 packet_n := 0
 61 for {
 62 n, err := c.from.Read(b)
 63 if err != nil {
 64 c.logger <- []byte(fmt.Sprintf("Disconnected from %s\n",
 65 break
 66 }
 67 if n > 0 {
 68 c.logger <- []byte(fmt.Sprintf("Received (#%d, %08X)
  %d bytes from %s\n",
  packet_n, offset, n, from_peer))
 69 c.logger <- []byte(hex.Dump(b[:n]))
 70 c.binary_logger <- b[:n]
 72 c.logger <- []byte(fmt.Sprintf("Sent (#%d) to %s\n",
  packet_n, to_peer))
 73 offset += n
 74 packet_n += 1
 75 }
 76 }
 77 c.from.Close()
 79 c.ack <- true
 80 }

In lines 81-107 there is a function processing the entire connection. It connects to the remote socket (line 82), measures the duration of the connection (lines 88, 101-103), launches the loggers (lines 93-95) and finally launches two data transferring threads (lines 97-98). The pass_through functions run until both peers are active. In lines 99-100 we wait for acknowledgments from the data transferring threads. In lines 104-106 we terminate the loggers.

  81 func process_connection(local net.Conn, conn_n int, target string) {
  82 remote, err := net.Dial("tcp", target)
  83 if err != nil {
  84 fmt.Printf("Unable to connect to %s, %v\n", target, err)
  85 }
  86 local_info := printable_addr(remote.LocalAddr())
  87 remote_info := printable_addr(remote.RemoteAddr())
  88 started := time.Now()
  89 logger := make(chan []byte)
  90 from_logger := make(chan []byte)
  91 to_logger := make(chan []byte)
  92 ack := make(chan bool)
  93 go connection_logger(logger, conn_n, local_info, remote_info)
  94 go binary_logger(from_logger, conn_n, local_info)
  95 go binary_logger(to_logger, conn_n, remote_info)
  96 logger <- []byte(fmt.Sprintf("Connected to %s at %s\n",
  target, format_time(started)))
  97 go pass_through(&Channel{remote, local, logger, to_logger, ack})
  98 go pass_through(&Channel{local, remote, logger, from_logger, ack})
  99 <-ack // Make sure that the both copiers gracefully finish.
 100 <-ack //
 101 finished := time.Now()
 102 duration := finished.Sub(started)
 103 logger <- []byte(fmt.Sprintf("Finished at %s, duration %s\n",
  format_time(started), duration.String()))
 104 logger <- []byte{} // Stop logger
 105 from_logger <- []byte{} // Stop "from" binary logger
 106 to_logger <- []byte{} // Stop "to" binary logger
 107 }

In lines 108-132 is the main function running the TCP/IP listener. In line 109 we ask the Go runtime to use all physically available CPUs.

 108 func main() {
 109 runtime.GOMAXPROCS(runtime.NumCPU())
 110 flag.Parse()
 111 if flag.NFlag() != 3 {
 112 fmt.Printf("usage: gotcpspy -host target_host -port target_port
 113 flag.PrintDefaults()
 114 os.Exit(1)
 115 }
 116 target := net.JoinHostPort(*host, *port)
 117 fmt.Printf("Start listening on port %s and
  forwarding data to %s\n",
  *listen_port, target)
 118 ln, err := net.Listen("tcp", ":"+*listen_port)
 119 if err != nil {
 120 fmt.Printf("Unable to start listener, %v\n", err)
 121 os.Exit(1)
 122 }
 123 conn_n := 1
 124 for {
 125 if conn, err := ln.Accept(); err == nil {
 126 go process_connection(conn, conn_n, target)
 127 conn_n += 1
 128 } else {
 129 fmt.Printf("Accept failed, %v\n", err)
 130 }
 131 }
 132 }

This is it, just 132 lines. Please note: we used only the standard libraries, coming out of the box.

Now we are ready to run:

 go run gotcpspy.go -host -port 110 -local_port 8080

It should print:

 Start listening on port 8080 and forwarding data to

Then you can run in another window:

 telnet localhost 8080

and enter, for instance, USER test [ENTER] and PASS none [ENTER]. The three log files will be created (the time stamp, of course, could be different in your case).

Bi-directional hex dump log log-2012.04.20-19.55.17-0001- -49544-

 Connected to at 2012.04.20-19.55.17
 Received (#0, 00000000) 38 bytes from
 00000000 2b 4f 4b 20 50 4f 50 20 59 61 21 20 76 31 2e 30
  |+OK POP Ya! v1.0|
 00000010 2e 30 6e 61 40 32 36 20 48 74 6a 4a 69 74 63 50
  |.0na@26 HtjJitcP|
 00000020 52 75 51 31 0d 0a
 Sent (#0) to [--1]-8080
 Received (#0, 00000000) 11 bytes from [--1]-8080
 00000000 55 53 45 52 20 74 65 73 74 0d 0a
  |USER test..|
 Sent (#0) to
 Received (#1, 00000026) 23 bytes from
 00000000 2b 4f 4b 20 70 61 73 73 77 6f 72 64 2c 20 70 6c
  |+OK password, pl|
 00000010 65 61 73 65 2e 0d 0a
 Sent (#1) to [--1]-8080
 Received (#1, 0000000B) 11 bytes from [--1]-8080
 00000000 50 41 53 53 20 6e 6f 6e 65 0d 0a
  |PASS none..|
 Sent (#1) to
 Received (#2, 0000003D) 72 bytes from
 00000000 2d 45 52 52 20 5b 41 55 54 48 5d 20 6c 6f 67 69
  |-ERR [AUTH] logi|
 00000010 6e 20 66 61 69 6c 75 72 65 20 6f 72 20 50 4f 50
  |n failure or POP|
 00000020 33 20 64 69 73 61 62 6c 65 64 2c 20 74 72 79 20
  |3 disabled, try |
 00000030 6c 61 74 65 72 2e 20 73 63 3d 48 74 6a 4a 69 74
  |later. sc=HtjJit|
 00000040 63 50 52 75 51 31 0d 0a
 Sent (#2) to [--1]-8080
 Disconnected from
 Disconnected from [--1]-8080
 Finished at 2012.04.20-19.55.17, duration 5.253979s

Binary log of outgoing data log-binary-2012.04.20-19.55.17-0001 -

 USER test
 PASS none

Binary log of incoming data log-binary-2012.04.20-19.55.17 -0001-

 +OK POP Ya! v1.0.0na@26 HtjJitcPRuQ1
 +OK password, please.
 -ERR [AUTH] login failure or POP3 disabled, try later. sc=HtjJitcPRuQ1

It seems to work, so let's try to measure the performance by downloading a bigger binary file directly and then via our proxy.

Downloading directly (file size is about 72MB):

 time wget
 Saving to: `otp_src_R15B01.tar.gz'
 real 1m2.819s

Now let's start the proxy and then download through it:

 go run gotcpspy.go -port=80 -listen_port=8080


 time wget http://localhost:8080/download/otp_src_R15B01.tar.gz
 Saving to: `otp_src_R15B01.tar.gz.1'
 real 0m56.209s

Let's compare the results.

 diff otp_src_R15B01.tar.gz otp_src_R15B01.tar.gz.1

It matches, which means the program works correctly.

Now the performance. I repeated the experiment a few times on my Mac Air. Surprisingly, downloading via the proxy worked for me even a bit faster than directly. In the example above: 1m2819s (directly) vs 0m.56209s (via proxy). The only explanation I can imagine is that wget is single-threaded, and it multiplexes incoming and outgoing streams in one thread. In turn, the proxy processes the streams in the individual threads, and perhaps this causes a tiny speedup. But the difference is small, almost negligible, and maybe on another computer or network it would disappear completely. The main observation is: downloading via proxy doesn't slow things down, despite the additional overhead of creating quite massive logs.

In summary, I'd like you to look at this program from the angle of simplicity and clarity. I've pointed it out above but I'd like to underline it again: I had started using threads in this application gradually. The nature of the problem gently pushed me to identify concurrent activities in processing a connection, and then the ease and safety of concurrency mechanisms in Go had finished it off, and eventually I used concurrency without thinking about the efficiency vs complexity (and difficulty to debug) trade-off.

Agreed, sometimes a problem simply needs to thrash bits and bytes, and the linear efficiency of the code is the only thing you care about. But more and more you encounter problems where the capability of concurrent, multi-threaded processing becomes the key factor, and for this kind of application, Go will shine.

I hope this serves for you as a representative example showing off the ease and even beauty of concurrency in Go.

Alexander Demin is a software engineer and Ph.D. in Computer Science. Constantly exploring new technologies, he believes that something amazing is always out there. He can be contacted at or through his homepage and blog.

Send the author your feedback or discuss the article in the magazine forum.