Have you ever been in a situation where you had to migrate a lot of data quickly from one Postgres database to another? For example, you may need to perform a version upgrade of Postgres. Another common situation is the desire to move from a self-hosted database to a managed service. At iSeatz we were recently facing this problem. We had an old Postgres 9.1 database we wanted to migrate to Amazon Web Services (AWS) Relational Database Service (DMS) with minimal downtime. Typically, this task is very easy, you set up AWS DMS, wait for all the data to copy over, take a small outage to allow the data to be replicated fully, and then cut DNS over to your new database. The problem we had with this approach is that DMS does not support Postgres versions older than 9.4.

We wanted to keep our total outage window as small as possible but were struggling to find good solutions. There are tools like Slony I that allow you to stream data from a primary database to a secondary but nearly all of them required having secure shell access to the target database. Since our target database was in RDS and RDS does not allow ssh access, this wasn’t an option for us. Another option we considered was to use the replication services to copy from our old database to a more up to date version of Postgres and then use DMS to replicate from the newer DB. After testing we noticed some data inconsistencies when using this strategy.

This left us with only one good option, use a standard Postgres pg_dump. After attempting a pg_dumpall we timed the dump at 10 hours and a restore, using pg_restore, at 16 hours. This was far longer than our acceptable outage window. So what do you do when the only option available for performing the migration is too slow? You have to find ways to make it faster. For simplicity’s sake I’m going to skip all of the trials we performed and outline the winning strategy that got us migrated the fastest. For these steps we are assuming that the source database can be taken offline for the duration of the migration.

1. Dump your schema separately from the data

The reason for this will be more clear as your read on but simply put, dumping the schema separately allows us to have multiple concurrent dump and restore operations going at once.

2. Disable foreign keys prior to beginning a restore

A Postgres dump, typically, is a sequence of insert statements. If the table you are inserting into has foreign keys, then the rules of the keys must be evaluated for every insert. I found an article on Stack Overflow that outlined the procedure for disabling foreign keys:

 

create table if not exists dropped_foreign_keys (

       seq bigserial primary key,

       sql text

);

do $$ declare t record;

begin

   for t in select conrelid::regclass::varchar table_name, conname constraint_name,

           pg_catalog.pg_get_constraintdef(r.oid, true) constraint_definition

           from pg_catalog.pg_constraint r

           where r.contype = ‘f’

           — current schema only:

           and r.connamespace = (select n.oid from pg_namespace n where n.nspname = current_schema())

       loop

       insert into dropped_foreign_keys (sql) values (

           format(‘alter table %s add constraint %s %s’,

               quote_ident(t.table_name), quote_ident(t.constraint_name), t.constraint_definition));

       execute format(‘alter table %s drop constraint %s’, quote_ident(t.table_name), quote_ident(t.constraint_name));

   end loop;

end $$;

 

After you are done restoring data, you can re-enable the foreign keys by running the following command:

 

do $$ declare t record;

begin

   — order by seq for easier troubleshooting when data does not satisfy FKs

   for t in select * from dropped_foreign_keys order by seq loop

       execute t.sql;

       delete from dropped_foreign_keys where seq = t.seq;

   end loop;

end $$;

 

3. Flags that improve performance of the dump and restore

There are a number of flags that can be utilized with pg_dump and pg_restore that can drastically improve the performance of the runtime. Here’s what I ended up using:

pg_dump

–jobs=<NUMBER OF CORES> – This allows the dump to run multiple jobs. My recommendation is to set this equal to the number of cores available on the machine the dump is running on. You must either use the directory or custom output formats. Speaking of which:

–format=d – This tells pg_dump that you want to use the directory output format. This is required for parallel jobs.

–data-only – This is the flag to dump only the data from the tables and not the schema information.

–no-synchronized-snapshots – Prior to Postgres 9.2 this a requirement for running jobs in parallel

–schema=public – Instructs pg_dump to only dump the public schemas, for most cases this is all you need.

–file=<NAME OF DUMP> – The file/directory to output to.

–table=<NAME OF TABLE> – This flag specifies which table to dump. Multiple –table flag may be used together.

pg_restore

-jobs=<NUMBER OF CORES> – Just like we did with pg_dump, we can run restore commands in parallel.

–format=d – We have to tell pg_restore that we are using the directory format to restore from.

–disable-triggers – This temporarily disables the triggers that get fired when new data is inserted into the database. By not running these triggers you drastically improve insert speed.

4. Dump large tables by themselves and group small tables together

Doing this allows you to run multiple dumps and restores at the same time. You can specify the tables you want to group together by using the –table flag in pg_dump.

Conclusions

By using the techniques described above, I reduced the dump and restore time for the database from 26 hours to 4.5 hours. This technique assumes that you don’t have active connections to the database. While I have found these configurations to be beneficial, your needs and configuration may differ and you must be extremely careful when disabling triggers and foreign keys. Specifically, depending on your database configuration triggers may be an essential component of how data is manipulated in the database. Make sure to validate that it is safe to disable these features prior to using the above strategy.