What are database migrations?
Database schemas define the structure and interrelations of data managed by relational databases. While it is important to develop a well-thought out schema at the beginning of your projects, evolving requirements make changes to your initial schema difficult or impossible to avoid. And since the schema manages the shape and boundaries of your data, changes must be carefully applied to match the expectations of the applications that use it and avoid losing data currently held by the database system.
In this article, we'll go over what database schema migrations are, what problems they solve, and different implementation strategies. We'll take a look at the various pain points they address as well as some potential pitfalls they introduce.
Database migrations, also known as schema migrations, database schema migrations, or simply migrations, are controlled sets of changes developed to modify the structure of the objects within a relational database. Migrations help transition database schemas from their current state to a new desired state, whether that involves adding tables and columns, removing elements, splitting fields, or changing types and constraints.
Migrations manage incremental, often reversible, changes to data structures in a programmatic way. The goals of database migration software is to make database changes repeatable, shareable, and testable without loss of data. Generally, migration software produces artifacts that describe the exact set of operations required to transform a database from a known state to the new state. These can be checked into and managed by normal version control software to track changes and share among team members.
While preventing data loss is generally one of the goals of migration software, changes that drop or destructively modify structures that currently house data can result in deletion. To cope with this, migration is often a supervised process involving inspecting the resulting change scripts and making any modifications necessary to preserve important information.
Migrations are helpful because they allow database schemas to evolve as requirements change. They help developers plan, validate, and safely apply schema changes to their environments. These compartmentalized changes are defined on a granular level and describe the transformations that must take place to move between various "versions" of the database.
In general, migration systems create artifacts or files that can be shared, applied to multiple database systems, and stored in version control. This helps construct a history of modifications to the database that can be closely tied to accompanying code changes in the client applications. The database schema and the application's assumptions about that structure can evolve in tandem.
Some other benefits include being allowed (and sometimes required) to manually tweak the process by separating the generation of the list of operations from the execution of them. Each change can be audited, tested, and modified to ensure that the correct results are obtained while still relying on automation for the majority of the process.
Migration systems are not without their faults but fortunately, most of these can be mitigated through processes and oversight.
Because migrations modify existing database structures, care must be taken to avoid data loss. This can be caused by incorrect assumptions made by the tooling or by transformations that require understanding of the meaning behind the data structures. Although it may be tempting to presume that the generated migration files are correct, since the migrations files are primarily concerned with the data structures, it is your responsibility to ensure that the data is also preserved or transformed correctly.
Migrations can also go poorly if they are applied to a database that is in a different state than assumed. For example, this can happen when changes are made to the database structure outside of the bounds of the migration system or when migrations are applied in the wrong order. All migration systems rely on understanding the current state of the database to correctly modify the existing structures. If the actual state diverges from the assumed state, the migrations may fail or may change the database in undesirable ways.
Another potential problem with migrations is that they are often very tool specific. Migration systems generate artifacts that represent the state of databases or the changes required. Although some tools produce results in plain SQL, they sometimes encode additional details for the tool to parse in comments or may use special filename formats to indicate ordering. Changing between migration tools can be difficult without a clean break between the previous and current generation of migration files.
There are two main types of schema transformation strategies employed by migration software: state based migrations and change based migrations. Either can be used effectively to manage the process and sometimes, it's helpful to employ a combination of both. The strengths of each approach, however, often dictate which fits best for a project based on how well it matches the development style of the team involved.
State based migration software creates artifacts that describe how to recreate the desired database state from scratch. The files that it produces can be applied to an empty relational database system to bring it fully up to date.
After the describing the desired state are created, the actual migration involves comparing the generated files against the current state of the database. This process allows the software to analyze the difference between the two states and generate a new file or files to bring the current database schema in line with the schema described by the files. These change operations are then applied to the database to reach the goal state.
Like almost all migrations, state based migration files must be carefully examined by knowledgeable developers to oversee the process. Both the files describing the desired final state and the files that outline the operations to bring the current database into compliance must be reviewed to ensure that the transformations will not lead to data loss. For example, if the generated operations attempt to rename a table by deleting the current one and recreating it with its new name, a knowledgable human must recognize this and intervene to prevent data loss.
State based migrations can feel rather clumsy if there are frequent major changes to the database schema that require this type of manual intervention. Because of this overhead, this technique is often better suited for scenarios where the schema is well-thought out ahead of time with fundamental changes occurring infrequently.
However, state based migrations do have the advantage of producing files that fully describe the database state in a single context. This can help new developers onboard more quickly and works well with workflows in version control systems since conflicting changes introduced by code branches can be resolved easily.
The major alternative to state based migrations is a change based migration system. Change based migrations also produce files that alter the existing structures in a database to arrive at the desired state. Rather than discovering the differences between the desired database state and the current one, this approach builds off of a known database state to define the operations to bring it into the new state. Successive migration files are produced to modify the database further, creating a series of change files that can reproduce the final database state when applied consecutively.
Because change based migrations work by outlining the operations required from a known database state to the desired one, an unbroken chain of migration files is necessary from the initial starting point. This system requires an initial state, which may be an empty database system or a files describing the starting structure, the files describing the operations that take the schema through each transformation, and a defined order which the migration files must be applied.
Change based migrations trace the provenance of the database schema design back to the original structure through the series of transformation scripts that it creates. This can help illustrate the evolution of the database structure, but is less helpful for understanding the complete state of the database at any one point since the changes described in each file modify the structure produced by the last migration file.
Since the previous state is so important to change based systems, the system often uses a database within the database system itself to track which migration files have been applied. This helps the software understand what state the system is currently in without having to analyze the current structure and compare it against the desired state, known only by compiling the entire series of migration files.
The disadvantage of this approach is that the current state of the database isn't described in the codebase after the initial point. Each migration file builds off of the previous one, so while the changes are nicely compartmentalized, the entire database state at any one point is much harder to reason about. Furthermore, because the order of operations is so important, it can be more difficult to resolve conflicts produced by developers making conflicting changes.
Change based systems, however, do have the advantage of allowing for quick, iterative changes to the database structure. Instead of the time intensive process of analyzing the current state of the database, comparing it to the desired state, creating files to perform the necessary operations, and applying them to the database, change based systems assume the current state of the database based on the previous changes. This generally makes changes more light weight, but does make out of band changes to the database especially dangerous since migrations can leave the target systems in an undefined state.
Now that we've introduced the two major classes of migration systems, let's take a look at how you actually use these tools during application development, deployment, and collaboration.
Database migrations become potentially relevant from the moment you make your first modification to the initial data schema. In fact, most migration tools generate migration files to bring a relational database from a completely empty state to the initial schema.
As you develop your application, the shape of the data you wish to store, the requirements of the database system, and the specific details you want to preserve often change. This can happen as you understand the problem space more completely or as additional features require additional data or a different layout to current information. It is important to define these schema changes as they occur during development to ensure that the artifacts are synchronized and deployed with the associated code.
Storing and managing migration files with the associated codebase allows the database schema changes to go through the same review process that code changes receive. As mentioned earlier, state based migrations can make this process easier because conflicts between the desired state are apparent by diffing the files. Change based systems require more care and hands on resolution when conflicts occur since ordering is important and the entire data structure is not visible in any of the iterative migration files.
In general, it's best practice to be conservative in the way you define database changes in development and remember that your changes may eventually be run against production data. This is especially important for destructive changes. Safer options to deletion should be considered, such as renaming columns instead of deleting them, performing "soft" deletes by unpublishing the data instead of removing it, or copying data structures when transforming it instead of overwriting it.
Put another way, database schema migrations are one of the changes you'll be maintaining with your code changes. Schema migrations should generally be considered in tandem with code changes to ensure that all changes are documented and applicable to the current database design.
Once your migrations have been created, you will usually deploy the them in increasingly important environments to ensure that they work as expected and that all of your code's requirements are satisfied. While you likely developed and applied the migrations in a development environment, you will probably need to test it in additional environments with more realistic data, integrations, and load, just as you would with code changes.
At this point, you need to ensure that the migration files apply cleanly to the target database, that the initial target state shares the structure and the assumptions you had while developing the migration files, and that the data held by the database is not negatively impacted by the changes you are making. A mixture of manual and automated checks may be performed to confirm these requirements are satisfied.
Once the changes have been validated as safe and functional in the staging environments, they may be ready to applied to production database systems. Before applying the migrations to production systems, it is important to consider performing a full backup of the current data if a recent copy is not currently available. Backups are vital in the event that the migration has unforeseen consequences or if differences between the staging environments and production cause data loss.
Given all of the information we've covered so far, how do you actually go about deciding on the migration solutions best suited for your projects? There are a number of different considerations that you can use to narrow in on your best choices.
Sometimes, the codebase language, the development framework, or the database backend you are using will strongly suggest the tools that you should use for managing migrations. For instance, while often not hard requirements, many web frameworks include their own migration systems for managing the schema of their backing databases. These often have good integration within the rest of the framework's tooling and can help reduce the friction of managing database changes as a separate process.
Another consideration that might affect your selection is the level of maturity and the amount of support provided for a given tool. As we mentioned above, the migration files produced by various tools tend to not be directly compatible with one another, though they may be able to be applied directly as pure SQL. Choosing a stable option can help you avoid scenarios where a tool falls into disuse and is no longer maintained, forcing you to change your migration process midway through development.
A further point that you might want to consider is the format of the actual migration artifacts. If they are well-formatted and well-organized SQL files, you can probably rely on being able to understand the changes that the files produce. Likewise, if the migrations are written in the programming language that the rest of your project is using, the meaning can often be easy to parse and require less of a context switch from other development tasks. Avoid tools that produce migration files that are difficult to understand or modify manually.
Another factor in your decision may involve additional features provided by migration systems. Some tools offer great integration with other tools or offer flexible options like the ability to squash previous change based migration files into a "checkpoint" state file. Tools have varying support for inspecting migration failures or rolling back changes when problems occur. Your requirements and development philosophy can impact which of these features you need.
In this article, we took a look at the concept of database schema migrations and discussed some of the strategies used to manage changes to data structures. We talked about why migrations are helpful, different classes of migration systems, and some benefits that each approach offers. Finally, we talked about how to actually incorporate migrations into your development workflow and how to find the tools that best match your style and needs.
Generally speaking, while schema migration tools are optional, the organization, functionality, and control they offer make them some of the most widely used database tools for development. Being able to plan and execute changes to the database structure and record those modifications with the rest of your code is incredibly valuable. While migration tools might vary in the features they offer or the strategy they use to analyze differences and apply changes, the role they fulfill is an important component to reliable and predictable development.