Migrate Django models to UUID primary key

Thu, Sep 28, 2017 5-minute read

Sometimes, old design decisions comes back to bite you. This is one of those tales.

A project I’m working on had a Django model similar to this:

class Municipality(models.Model):
    code = models.CharField(max_length=2, primary_key=True)
    name = models.CharField(max_length=100)

and it was used by other models such as:

class ZipCode(models.Model):
    code = models.CharField(max_length=2, primary_key=True)
    municipality = models.ForeignKey(Municipality)

And all was good, until we needed to expand the municipality model to support different countries, and thus a single unique field with only the code - which may collide across countries, was not enough.

For all the modern parts of the code base we use UUIDs as the primary key, so we wanted to migrate the primary key of Municipality to a UUID, while retaining all the relations.

As of September 2017, Django does not support migrating primary keys in any nice manner, so we’re on our own here.

We tried a lot of magic solutions, but we always got stuck with the migrations system not being able to detect and handle the changes nicely.

After some research and quite a bit of trial and error, we settled on the following approach. It has some draw backs I’ll get back to, but works reasonable well.

A quick reminder on how the world looks from a database perspective. When you define a ForeignKey-field in Django, it creates a database column of the same type as the primary key of the referenced model on the referring model, and adds the foreign key constraints. So in the example above, we have two tables (in pseudo SQL):

CREATE TABLE municipality (
   code varchar(2) PRIMARY KEY NOT NULL,
   name varchar(100)
);

CREATE TABLE zipcode (
   code varchar(2) PRIMARY KEY NOT NULL,
   municipality_id VARCHAR(2) REFERENCES(municipality.id) NOT NULL
);

So, we need to break the foreign key constraints, alter the root model, then map the the new primary key ids to the old ones, and re-apply the foreign keys to it.

We start by breaking the foreign keys in the referring models.

class ZipCode(models.Model):
    code = ...  # Same as before
    municipality = models.CharField(max_length=2)
python manage.py makemigrations -n break_zipcode_muni_foreignkey

Now that the Municipality model is free from any external referring models, we can go to work on it.

Start by adding the new id field:

class Municipality(models.Model):
    id = models.UUIDField(default=uuid.uuid4)
python manage.py makemigrations -n add_id_field_to_muni

For some reason, the default didn’t work in my case, so I added a step in the created migration to create new unique ids for all rows.

def create_ids(apps, schema_editor):
    Municipality = apps.get_model('loc', 'Municipality')
    for m in municipality:
        m.id = uuid.uuid4()
        m.save()
        
# ...

operations = [
    migrations.AddField(...),
    migrations.RunPython(code=create_ids),
]

Now we have a UUID id-field on Municipality, and we should be able to switch the primary key around.

class Municipality(models.Model):
    id = models.UUIDField(default=uuid.uuid4, primary_key=True)
    code = models.CharField(max_length=2, unique=True)

Create the migration, and make sure that the AlterField operation for code is run before the AlterField on id. We’ve added primary_key to the id field and unique=True to the code field, since we still want to enforce that constraint for now, and we lost it when we removed the primary_key attribute from it.

Congratulations, we now have a new UUID primary key. But we still need to clean up everything we broke the foreign keys from.

Lets start by creating an empty migration:

python manage.py makemigrations --empty -n fix_zipcode_fk_to_muni_uuid loc

Open the file, and let us begin:

def match(apps, schema_editor):
    ZipCode = apps.get_model('loc', 'ZipCode')
    Muni = apps.get_model('loc', 'Municipality')
    for zip_code in ZipCode.object.all():
        zip_code.temp_muni = Muni.get(code=z.municipality)
        zip_code.save()
    
# ...
operations = [
    migrations.AddField(
        model_name='zipcode',
        name='temp_muni',
        field=models.UUIDField(null=True),
    ),
    migrations.RunPython(code=match),
    migrations.RemoveField(model_name='zipcode', name='municipality'),
    migrations.RenameField(
        model_name='zipcode', old_name='temp_muni', new_name='municipality'),
    migrations.AlterField(
        model_name='zipcode',
        name='municipality',
        field=models.ForeignKey(
            on_delete=django.db.models.deletion.PROTECT,
            to='municipality')
]

Let us go through the steps here.

  1. Add a temporary field for storing the UUID of Municipality that we want to connect to. We don’t make it a ForeignKey-field just yet, as Django gets confused about the naming later on.
  2. We run the match function for looking up the new ids by the old lookup key, and store it in the temp field.
  3. Remove the old municipality field
  4. Rename the temp field to municipality
  5. And finally migrate the type to a foreign key to create all the database constraints we need.

And there you go. All done.

There are some down sides here. Since we split the migrations into several files/migrations, we leave ourself vulnerable if any of the later migrations fail. This will probably leave the application in a pretty unworkable state. So make sure to test the migrations quite a bit. You can reduce the risk by hand editing all the steps into one migration, but if you have references from multiple different apps, then you need to the breaking and restoring in separate steps anyway.

Logging / Debugging

You’ll most likely end up with some SQL errors during the process of creating these, so a nice trick I like to do is to create a simple logging migration operation.

def log(message):
    def fake_op(apps, schema_editor):
        print(message)
    return fake_op
    
    
 # ...
 
 operations = [
     migration.RunPython(log('Step 1')),
     migration.AlterField(..),
     migration.RunPython(log('Step 2')),
     # ...
 ]

This allows you to see where in the process you fail.

To see what SQL Django creates for a given migration, run python manage.py sqlmigrate <appname> <migration_number> - this is super useful for checking whether operations are run in the order that you expect.