Migrating an application to multi-tenancy

Going from a database where all information from all users/teams is mixed to an application that uses multi-tenancy to separate the information per user can be quite difficult. Especially if those users created a bunch of data already. In this article, we will go over the steps I took to migrate my SaaS to a multi-tenancy structure.

A bit of backstory: back in the day, when I started my SaaS business, I had no idea what multi-tenancy meant. Honestly, the code was a big mess and I was unintentially almost asking for a data breach at some point. I was filtering every query by myself. No helpers, no middleware and no multi-tenancy - huge mistake.

A friend of mine explained multi-tenancy to me and I was blown away. It was exactly what I needed to have to make data filtering easy and make mistakes much more difficult to make. Not to mention the fact that queries would be faster due to the smaller tables. Moving to a multi-tenancy architecture with an app that has over 100 businesses registered isn’t easy though and it took me quite some time to figure it all out. Another friend of mine wanted to make this same move and we had quite a fun time figuring it all out.

What is multi-tenancy?

Basically, multi-tenancy is a software architect that allows us to separate code and data specifically for different instances or user groups. This means that queries made over a multi-tenancy set up could give different results for different people or groups.

You will need to use schemas for that. A schema separates the data from other people/groups. You can set this up in three ways:

  1. Create a separate database for each schema.
  2. One database, but with separated schemas. Meaning that you will not be able to use any other schema.
  3. One database, but with shared schemas. This means that you can have both some separated tables, but also a few shared tables. The shared tables are easily accessible for all users.

I picked the 3rd option.

There are two ways to migrate the site to multi-tenancy. One would be through Django and the other would be through plain SQL statements. I prefer and went with the first one. My friend went with the second option. Both can work, but since I don’t have his code, I will only explain the way that I did this.

How to do the migration

Make sure to take a backup of your data and preferably run this on a staging server, before running this on the production server!

As mentioned before, I have tables that will be shared by all tenants, but most of them are private. We need a name for every scheme (so make sure that you have a field that is not nullable and always unique!). I created the names of the schemas based on one of the shared tenants - the one where the name of the organizations is stored.

Set up multi-tenancy, then copy and paste all of the models that should not be shared to one of the not shared apps and run the migrations. Now you have identical models and created one schema (the default one).

Up next, we will create all of the schemas and then move over our data. We will have to loop over the records of the table that we picked earlier. In my case, this is the table Org and I will use Org.name as the unique name to use in the schema.

from django.db import connection
from customers.models import Client

for org in Org.objects.all():
    connection.set_schema('public')
    tenant = Client.objects.create(schema_name=org.name, name=org.name)

That’s all we need to create the tenants. Don’t run it yet, we will need to add more in the loop. As you can see, we first set the right schema. This means that with every new iteration in the loop, we will start out in the public schema. We do this because we need the data from the public schema first to be able to move it.

Up next, we need to get the data that we need.

from django.core.cache import cache

cache.set('TableName', TableName.objects.filter(org=org).values())

This will create a queryset in the cache. You will have to do this with every model you have that you want to get in the new schema. We need to have it in the cache as we will not be able to get it from the model once we are running in the schema. We are currently in the public schema and will need to go now to the specific with connection.set_schema(schema_name).

So after that cache.set(...) part, add this:

connection.set_schema(org.name)

for j in cache.get('TableName'):
	TableName.objects.create(**j)

You will have to repeat the second part for every model you want to move. It will then add the new data in the new tenant.

There is one issue with doing that; the id column is not indexed by your database. Will have to do that manually with this:

from django.db import connection
cursor = connection.cursor()
cursor.execute("SELECT setval(pg_get_serial_sequence('" + tenant.schema_name + ".<project>_<app>_<model>','id'), CAST(max(id) AS INT), max(id) IS NOT null) FROM " + tenant.schema_name + ".<project>_<app>_<model>")

Replace project with your project name, app, with the app name and model with the model name. You will have to repeat the last line for every model you are moving. At last, you can simply remove the data from the public tenant by removing the models in the shared apps.

And that’s all it takes to change your project from no multi-tenancy to multi-tenancy.

Stan Triepels's Picture

About Stan Triepels

Stan is professional web developer working mainly with Django and VueJS. Blogging about my past mistakes. Check out my Django and VueJS course. Peace <3

The Netherlands https://djangowaves.com

Comments