Converting from MySQL to Postgres with pgloader, for Heroku
This post touches on what I did to convert an existing MySQL database to Postgres, which I eventually loaded onto Heroku. The hopes is that it’ll highlight some of the hiccups that I ran into. As much as I can I’m going to use real examples and not just quote documentation.
Who is this article for?
For anyone migrating a relatively simple database from MySQL to Postgres. I didn’t get too deep into pgloader (and there’s seriously a lot you can customize), but I did run into a few snags.
For beginner programmers like myself. I couldn’t find a guide I liked and wish I had.
Why MySQL to Postgres?
I recently decided to rewrite and move a website I maintain called Treelib from a Digital Ocean Droplet to a Heroku Dyno. It was just something I wanted to try and it makes things easier for the scale of traffic I’d be dealing with. Heroku works out of the box with Postgres, but you can use add-ons like ClearDB if you’d prefer to use MySQL. I didn’t prefer MySQL, and so the story begins…
A few other things worth mentioning:
The rewrite I did was from a Sinatra Ruby app to a Phoenix Elixir app Ecto (database client for Elixir) likes timestamps(), so moving over to Phoenix required some database migrations. I ran those migrations while still in MySQL, then ported over to Postgres.
Step 1: Get pgloader
To convert from MySQL to Postgres I used pgloader, which is a great tool. There’s plenty of documentation to sift through and the founder Dimitri seems to be quite active.
Step 2: Converting to Postgres
Create a postgres db to move your MySQL db into. Then to convert use pgloader to convert to Postgres (it sounds easy because it is). Here’s an example from pgloader’s documentation:
createdb pagila pgloader mysql://user@localhost/sakila postgresql:///pagila
What I actually ended up using was:
createdb treelib pgloader mysql://root:mysql@localhost/treelib postgresql://localhost/treelib # my local mysql db has a password of "mysql"
You can then test that it worked by running:
psql -h localhost -d [your database name] # once in postgres, list all tables by running: \dt
That’s it! You might be in the clear at this point, but I ran into some snags (which is really why I wrote this article). And so the story continues…
Step 3: Not quite there yet
These are a few of the issues I encountered.
Schema issues, aka my Heroku woes
Now depending on your setup, when you run the \dt command above (in the postgres console) you’ll notice that your schema is set to your database name or something similar. Here’s what I saw for my Treelib project after converting to postgres:
List of relations Schema | Name | Type | Owner - - - - + - - - - - - - - - -+ - - - -+ - - - - - - - - treelib | families | table | nathanwillson treelib | genera | table | nathanwillson treelib | photo_albums | table | nathanwillson ... (full table list omitted)
It might still work in your dev environment, but Heroku doesn’t like that. What you want is a “public” schema. Dimitri, the maintainer of pgloader, addresses the solution here. But to summarize, the easiest (and most maintainable) way is create a .load file:
LOAD DATABASE FROM mysql://user@host/dbname INTO pgsql://user@host/dbname ALTER schema 'dbname' rename to 'public';
Replace those mysql and pgsql addresses with your own database urls and don’t forget to fill in ‘dbname’ with the schema you’re replacing. Again, here’s my load file looked like:
LOAD DATABASE FROM mysql://root:mysql@localhost/treelib INTO postgresql://localhost/treelib ALTER schema 'treelib' rename to 'public';
Then run the pgloader via:
If everything checks out then psql into the database, then run
List of relations Schema | Name | Type | Owner - - - - + - - - - - - - - - -+ - - - -+ - - - - - - - - public | families | table | nathanwillson public | genera | table | nathanwillson public | photo_albums | table | nathanwillson ... (full table list omitted)
Converting MySQL bigints (aka: casting considerations)
When converting from MySQL to Postgres, pgloader has a number of default cast types. The source code for those defaults is here and there’s some documentation
here. In my case, converting from MySQL bigint resolved as a numeric, but I’d rather it resolve as a Postgres bigint still. Here’s what one of my tables looked like in MySQL:
MySQL: +--------------+---------------------+------+-----+---------+ | Field | Type | Null | Key | Default | | +--------------+---------------------+------+-----+---------+ | id | bigint(20) unsigned | NO | PRI | NULL | | flickr_id | bigint(20) unsigned | NO | | NULL | | photoset_id | bigint(20) unsigned | NO | | NULL | ... (I've ommited the "Extra" column to make the text fit)
After initial conversion to Postgres:
Postgres: Column | Type | --------------+------------------------+ id | bigint | flickr_id | numeric | <-- not what I wanted! photoset_id | numeric | <-- also not what I wanted! ... (I've omitted the "Modifiers" column to make the text fit)
Easy fix? You bet. Just a one liner to add to your .load file (the one from Schema issues) and Bob’s yer uncle:
CAST type bigint to bigint drop typemod;
And here’s my final .load file:
LOAD DATABASE FROM mysql://root:mysql@localhost/treelib INTO postgresql://localhost/treelib ALTER schema 'treelib' rename to 'public' CAST type bigint to bigint drop typemod;
So converting from MySQL to Postgres out of the box was pretty simple, but there were still a few hiccups to sift though. Hopefully this guide can offer some hints as to what to look out for.
To summarize what was covered:
Converting from Postgres to MySQL using pgloader
Schema “public” issues
Overriding default cast types
Also thank you Dimitri for creating pgloader.