A quiet, careful upgrade

Over the past few weeks, I’ve been deep in the engine room of our infrastructure, quietly preparing a major upgrade to the database that powers everything behind the scenes. On the surface, nothing changed. Kanka loaded, pages rendered, notifications arrived on time. But underneath, a careful operation was unfolding, involving planning, testing, backups, and some coffee-fueled troubleshooting while my daughter was fast asleep.

The original setup has served us well-ish: a single powerful database server, the heart of Kanka’s data. It handled everything from your logins and entities to permissions and campaign data. But software ages. Security patches stop coming, performance improvements are left on the table, and new features aren’t accessible unless we move forward.

The headaches

While preparing for our next update, one of our third-party plugins required a new version of the software powering our database. I could of course simply do an update and cross my fingers, but I had been bothering Jon for months about wanting a more resilient setup altogether.

I explored many paths, each coming with their own multiple sidequests. Eventually, I decided that having a second, replica database server was the way to go. Our software (MariaDB) allows a replica server to run a (slightly) different version from the primary server. This would allow me Kanka users to unwillingly test Kanka’s data on a new server version without even knowing while I iron out any issues.

Challenges

Setting up a replication server wasn’t a walk in the park. You can’t just install it, and tell it “here’s your primary, you’re a replica, take care of everything”. No, it first needs to import a database backup. I have two backups a day, but transferring backups was a mess and let me to my first sidequest.

Backups are compressed and sent to a storage service, but retrieving them was a manual and painful process. So I ended up setting up script to make automate the process, in case something with the new setup failed. This was a full day effort, but it’s very satisfying to see the automation happen.

Adapting the code

Fast-forward a bit and the new replica server was ready to be tested. I first set up the plugins library to send all visitors (unlogged users) to it so that I could monitor errors and problems. Shortly after, I tried the same with Kanka but was quickly hit with a roadblock: for permissions, even visitors, a temporary database is created to store a simplified permission setup. Visitors still need to load their “public” permissions somewhere.

A few coffees later, the code was adapted to not rely on the temporary table for visitors and went live. This worked, but another problem popped up: visitors trying to log in were unable to do so. Whoops. A few similar issues cropped up but eventually everything was setup and stable.

Visitors now read data from a separate, read-only replica. This means that visitors browsing public campaigns or exploring content no longer add pressure to the main database. It’s a subtle shift, but one that helps the app stay fast and responsive even during high traffic.

The big day

Eventually, after some more testing and a mix of nervousness and confidence, the big day came. My guestimate was for everything to take about 30 minutes, but doubled that when notifying users because some issue always crops up. I always had the option to roll back to a single database if something went wrong after the hour mark.

I took the app offline briefly, promoted the new database server to master, reconfigured the old server to use a newer version of MariaDB and become a replica, and brought everything back online. In the end, it worked exactly as planned (_{minus a few last minute hiccups like a typo in a config file on both servers}). The app was offline for just over 30 minutes, and now runs on newer, faster, more resilient and better-supported infrastructure.

In the future

This is just the first step, but it’s an important one. With this new setup in place, I now have the flexibility to add more replicas, and even swap out old servers with new ones seamlessly, avoiding lengthy downtime. Future upgrades will be smoother, safer, and faster. It’s the kind of foundation that opens the door for scaling, resilience, and long-term improvements.

I know all this happens behind the scenes, but I wanted to share it with the community. Every once in a while, it’s worth surfacing the quiet work that keeps the app stable, secure, and ready for what’s next. Thanks for trusting us run this show and for sticking around while we keep making things a little better, one upgrade at a time.

Kanka Blog

A quiet, careful upgrade

The headaches

Challenges

Adapting the code

The big day

In the future

Comments

One response to “A quiet, careful upgrade”

A quiet, careful upgrade

The headaches

Challenges

Adapting the code

The big day

In the future

Share this:

Comments

One response to “A quiet, careful upgrade”