Since my last post I’ve been quite busy with travel and recovery from same, and thus I haven’t poked around trying to work on my Hobby SDK.

Which also means, I haven’t spoken about more widely either. Just after I re-built the site to use Github actions too, to make it easier to write more. Ah well. Still need to sort out a few CSS things though, since line wrapping is weird… I want to have the markdown look good, but not have that affect the rendered output beyond formating…

That’s for a later post. This is about how to deal with Stateful DoFns!


The next thing to do for the Hobby SDK is add State and Timers, and ultimately, stateful transforms. I was stuck on this for a little bit. Specifically how to add these to the graph construction mechanisms, while also avoiding writing too much code to replicate the type system.

The trick though is that Stateful DoFns, that is, those that make use State and Timers in any capacity, have a hard restriction: they must take in KVs as an input. This is how the Beam model enables stateful stream processing, without sacrificing parallelism.

The question then, is how do I have the hobby SDK enforce this?

You see, in most Beam SDKs, there is really only a single ParDo call as a part of graph construction, and it will accept DoFns of any kind. The simple light weight ones, to having Side Inputs, to Splittable DoFns, and of course, stateful DoFns. But then there’s a later step that does the enforcement of these compatibility requirements.

And that’s just a lot of code. Unlike in Java, Go Generics don’t erase types, nor can I have overloaded implementations to selectively validate different constraints. It could be possible to make some way of “detecting” the KV, using reflection to enforce validating that the outer type is a KV, but that goes against the ethos of this Hobby SDK.

I want to have the Go Compiler do the enforcement, as much as possible.

This means that I’ll end up with a separate StatefulParDo top level graph construction function. There’s a benefit to this: It can be clearly documented and provides a natural place to put all the documentation for stateful DoFns, instead of incredibly burdening the standard ParDo function.

This StatefulParDo will use the same mechanism that GroupByKey requires: A DoFn that has a KV generic. This way, when folks are writing a pipeline, the Go Compiler’s type checker will let them know of the problem.

I’ll still need to detect stateful DoFns, since those features will still be modeled onto Exported fields and field types. That should be fine though. The two methods also enable a cleaner split for documenting the incompatibility of SplittableDoFns and Stateful DoFns. Sadly, no good way to represten this constraint as a type. Fields don’t meaningfully participate in the Go type system.

We need this because b, a SplittableDoFn can’t have State. It becomes very difficult to reason about that way due to other transforms that happen to SplittableDoFns. Afterall, the key can change while splitting elements.

Sometimes, you just have to have two things that do jobs well separately, instead of trying to have a single approach for everything.


Writing the post isn’t the same as actually finishing the work, but sometimes the best thing to do, is to split them out into separate posts and work. But the work does take longer.

Just gotta do it one bit at a time.