Migrating an application to multi-tenancy

Going from a database where all information from all users/teams is mixed to an application that uses multi-tenancy to separate the information per user can be quite difficult. Especially when those users created a bunch of data already. In this article, we will go over the steps I took to transition my SaaS to a multi-tenancy structure.

A bit of backstory: back in the day, when I started my SaaS business, I had no idea what multi-tenancy meant. Honestly, the code was a big mess and I was unintentially almost asking for a data breach at some point. I was filtering every query by myself. No helpers, no middleware and no multi-tenancy - huge mistake. No data ever leaked from my application, fortunately, and nowadays, there are very strict privacy and security guide lines that protect customers and users.

A friend of mine showed me multi-tenancy and I was blown away. It was exactly what I needed to have to make data filtering easy and make data breaches much more difficult. Not to mention the fact that queries would be faster due to the smaller tables. Transitioning to a multi-tenancy set up with an app that has over 100 businesses registered isn't easy though and it took me quite some time to figure it all out. Another friend of mine wanted to make this same transition and we had quite a fun time figuring it all out.

What is multi-tenancy?

Basically, multi-tenancy is a software architect that allows us to separate code and data specifically for different instances or user groups. This means that queries made over a multi-tenancy set up could give different results for different people or groups.

You will need to use schemas for that. A schema separates the data from other people/groups. You can set this up in three ways:

Create a separate database for each schema.
One database, but with separated schemas. Meaning that you will not be able to use any other schema.
One database, but with shared schemas. This means that you can have both some separated tables, but also a few shared tables. The shared tables are easily accessible for all users.

I picked the 3rd option.

There are two ways to migrate the site to multi-tenancy. One would be through Django and the other would be through plan SQL statements. I prefer and went with the first one. My friend went with the second option. Both can work, but since I don't have his code, I will only explain the way that I did this.

How to do the migration

Make sure to take a backup of your data and preferably run this on a staging server, before running this on the production server!

As mentioned before, I have tables that will be shared by all tenants, but most of them are private. We need a name for every scheme (so make sure that you have a field that is not nullable and always unique!). I created the names of the schemas based on one of the shared tenants - the one where the name of the organizations is stored.

Set up multi-tenancy, then copy and paste all of the models that should not be shared to one of the not shared apps and run the migrations. Now you have identical models and changed one schema (the default one).

Up next, we will create all of the schemas and then move over our data. We will have to loop over the records of the table that we picked earlier. In my case, this is the table Org and I will use Org.name as the unique name to use in the schema.

from django.db import connection
from customers.models import Client

for org in Org.objects.all():
    connection.set_schema('public')
    tenant = Client.objects.create(schema_name=org.name, name=org.name)

That's all we need to create the tenants. Don't run it yet, we will need to add more in the loop. As you can see, we first set the right schema. This means that with every new iteration in the loop, we will start out in the public schema. We do this because we need the data from the public schema first to be able to move it.

Up next, we need to get the data that we need.

from django.core.cache import cache

cache.set('TableName', TableName.objects.filter(org=org).values())

This will create a queryset in the cache. You will have to do this with every model you have that you want to get in the new schema. We need to have it in the cache as we will not be able to get it from the model once we are running in the schema. We are currently in the public schema and will need to go now to the specific with connection.set_schema(schema_name).

So after that cache.set(...) part, add this:

connection.set_schema(org.name)

for j in cache.get('TableName'):
    TableName.objects.create(**j)

You will have to repeat the second part for every model you want to move. It will then add the new data in the new model.

There is one issue with doing that; the id column is not indexed by your database. Will have to do that manually with this:

from django.db import connection
cursor = connection.cursor()
cursor.execute("SELECT setval(pg_get_serial_sequence('" + tenant.schema_name + ".project_app_model','id'), CAST(max(id) AS INT), max(id) IS NOT null) FROM " + tenant.schema_name + ".project_app_model")

Replace project with your project name, app, with the app name and model with the model name. You will have to repeat the last line for every model you are moving.

And that's all it takes to change your project from no multi-tenancy to multi-tenancy.

Django multi-tenancy

Written by Stan Triepels

Stan is professional web developer working mainly with Django and VueJS. With years of experience under the belt, he is comfortable writing about his past mistakes and ongoing learnings.