How this blog is published from Google Docs

A programming exercise.

What if I could just write this blog in Google Docs, and not have to copy and paste stuff for publishing? Can a machine do the copy and pasting for me?

I started with a vague notion of wanting to write and publish a blog, more as an exercise for organizing my own thoughts than to gain an audience. As such, I had two main requirements for what we might call a “blog system”: 1) good editing experience, and 2) automatic, hassle-free publishing.

Most blog services seem to have the second part covered. However, I couldn’t find one that also had a great editing experience (although admittedly I didn’t look particularly hard). I came across editors that were either hopelessly outdated or more catered towards web design than writing, with an assortment of bells and whistles creating far too much friction for doing the basics (like making lists). In the end, I concluded that the best editor, when it came to long-form writing, was in fact Google Docs. To complete my blog system, all I needed was something that could automatically turn a folder of Google Docs into a website — a blog publisher.

As it happens, I was learning the Rust programming language at the time (for fun), so I figured building this blog publisher would be a nice practice project. So that’s what I did.

The Requirements

I wanted to write each blog post as a separate Google Doc, all placed in a particular Google Drive folder. I wanted the blog publisher to periodically go to this folder, list all the docs, and then generate a web page for each of them, plus a home page / blog index. The generated pages should retain roughly the same formatting as what I see when writing in Google Docs.

The System Design

Basically, I needed to periodically run a short program. There are a million ways to do that, but nowadays cloud-based functions-as-a-service is the way to go. In the AWS world, that’s Lambda.

In addition to compute, I also needed storage; the HTML files have to live somewhere. Given that my content is largely static, S3 was the only rational choice.

I also needed a domain name, which meant using a combination of Route53, Certificate Manager and CloudFront. While I could have simply used Route53 and a public S3 bucket, doing so meant losing HTTPS, which is largely unacceptable nowadays.

As for the software per se, a primary consideration was to not re-process every Doc on every run. Even for a hobby project like this, doing so would be excessively wasteful. To solve this, the program records the timestamp of the last successful run in DynamoDB. Then in the next run, it only fully processes any Docs that have been updated since then. E.g. If the program runs every morning, then on Wednesday morning it only needs to process docs that have been updated since Tuesday morning; it doesn’t need to worry about docs that were updated on Monday, last Sunday or earlier. However, the program does fetch the metadata for all Docs in the folder, so that it can generate the index page. While this part could arguably be optimized as well, given that it’s only metadata, it didn’t seem worth the effort.

The Development Process

I started by learning how to use the Google APIs for Drive and Docs. From the documentation it seems I can only use OAuth2 for these APIs (i.e. I can’t use API keys). This was a hassle, but to Google’s credit I was able to get it working by following their OAuth setup guide and Docs API quickstart sample code (in Java).

Long story short, by using the sample code I was able to get a refresh token. I then stored the token (and other secrets) in DynamoDB. It is then straightforward to get an access token using the refresh token, and then use the access token to call the actual Docs APIs.

Once I was able to fetch one of my Docs (as JSON), I started prototyping the code to convert it to a HTML file. Since Google didn’t have a Rust SDK for Docs (there was a GCP SDK but Docs wasn’t on there), I wrote the Rust model by hand. I only covered the parts that I needed; the full model is huge. The JSON returned by the Docs API can then be parsed into this Rust model using the serde and serde_json crates.

Once I had a working prototype, I cleaned up the code and pushed it into Github. Thereafter, whenever I thought of something that needed to be done, I created a Github Issue. Every day, I looked through the issue list and worked on a few of them.

The Result

Well, you’re looking at it!

Thoughts

Rust is fun. Writing Rust is sort of like playing mini-games with the borrow-checker. I can see how a lot of people might hate that, especially if they’re trying to do paid work under time pressure, but I personally find it a lot of fun. Each compile error is like a little puzzle to solve.

⚞☯⚟

← Back to blog index