A little Earth and a little sky

Published on Jul 14, 2021

It all started back in 2020 when I was working on a project in stealth when I really lost my head on a database schema. (Let’s not say which database, but it started with the letter M). It really annoyed me. It bugged me every time I’d work on the project and finally, I was so fed up — I started writing a database. In the beginning it was really a REPL: you’d type in SET x y and the REPL would say Okay while doing a SET behind the scenes. But what took me so much time in shipping features, like you’d expect me to?

Trouble 1: The protocol dilemma

HTTP? Nah — it is too slow for me. Something fancy? Nope. Why not just build something over good ol’ TCP? Well, that was it. But the design of the protocol is an entirely different story. It took atleast 10 iterations before I fixed my mind on what the first version would look like, but even then – I was not satisfied. The first protocol, if I remember well was very silly and used whitespaces as delimiters and had a payload size header which could be used to pre-allocate the buffer. The header and body sections were merely separated by newlines. It was inefficient to be read from and written to – but it made it to 0.1! In version 0.2 (again, that’s pulling stuff over my head) that was fixed by using predefined sizes in the header of the packet itself – but that didn’t play too well with long keys. I don’t precisely remember what iterations we went through, but by 0.4 – several of the problems were solved. And yes, that’s the story of Terrapipe.

Fun fact: Until v0.3.0, there was no persistence!

Trouble 2: The roadmap dilemma

Our goal from the beginning was to create a database that would reduce the complexity of data modeling, one of the things that led me to start this project. At the same time, the project had one more goal: to keep the codebase simple enough for anyone to hack into; these two things continually conflicted while I wrote things.

Trouble 3: The name game

So, it started with the Earth. Terrabase. My initial naming idea was based on the fact that I wanted to build a system that would fit all the data on Earth! Kiddish enough, that was the notion behind the name — but gnarly trademark grounds forced us to reconsider the name. Okay, how about Skybase? Sounded like just the thing. Turns out, there is a company who makes a tool for databases (not even a database :D) with the same name and had a trademark!

I laughed quite a bit at the trademark and got back to the name game. NASA’s successful landing of the Mars Rover, especially Ingenuity gave me the idea, with it taking Linux to Mars. Open-source was no longer bound to the Earth — but it was taking over the skies! The sky – here I come! And well, since our database would have tables: why not join them together? Heck, welcome to Skytable folks!

Fast forward: 0.6 and the path to stability

Until 0.5, Terrapipe was the de facto mode of communication, until a community member raised an issue that couldn’t be solved without a redesign of the protocol. It’s not that I liked Terrapipe, but well, it worked. And if it ain’t broke, don’t fix it :D.

So I started going through a bunch of designs and finally, this time, I thought I’d just want to save on bandwidth but make the encoding/decoding (or serialization and deserialization, however you’d like to put it) a faster step while also being able to carry around complex objects like high-level data structures. This effort is what led to Skyhash — and this time, I was very satisfied. It turns out that sometimes simplicity and elegance in protocol design can produce beautiful results. And sometimes faster ones as in this case. The protocol could now happily ser/de complex recursive arrays, multi-typed arrays, …. you name it. It had everything that we needed, was faster and more compact (and more robust at the same time). I announced that Skyhash has reached 1.0 and we’d be taking this protocol to Skytable 1.0.

What’s coming in Skytable

Most people would just go jumping around shipping 1.0. But are you really ready for 1.0? To me, 1.0 is not a number, but a milestone, a mark of reliability and most importantly, a mark of satisfaction (yes I’m quite the philosophical software developer; the unusual kind). Skytable is already stable enough for you to deploy, which makes it less of something unstable (several parts of Skytable’s own build infrastructure uses Skytable itself).

There’s a lot to the future of Skytable. Right now, for example, we’re working on implementing multiple keyspaces and tables and soon enough, it’s not hidden that I’m working on a new data model. Once we have this modeling set up on a single node, our goal has always been to build a scalable system – an infinitely scalable one, so you must have already gotten the hints!

The journey of building the protocol and the database is one that has been full of learnings by experience. You see, not everything works by reading the pages of a book – sometimes you’d just have to get out there and build it, to know what things really look like. Here’s to another year of flying tables — up in the sky!

– Sayan (July, 2021)