How to merge two MySQL databases manually [part 2] • Bruno says ✍️

Oh boi, I’m restoring this page since it still has traffic. It will be published as a page, not as a blogpost because it’s not inline anymore with the blog topic.

My name showed up on The University of Liverpool lecture slides and this is the story how I found about it.

However, I’m receiving daily request for this link (mainly on part 1), so here we go! This is the first draft, I’ll try to re-style it as it was before.

Original publishing date: December 16, 2012

Okay, here I will continue with my story about the database merging. In the second part, I’ll focus more on the technical stuff, how it is realized, how to get some values, how to execute some steps etc.

The idea…

After the discussion about the potential ways how to execute this migration, we agreed to write MySQL scripts and execute them one by one. Main focus was on the entity ID fields. As I wrote in the Part 1, we analyzed two specific groups of entities, one with common rules and the other with their own rules. For the first group we used Carbon 5 maven plugin [archived version of that page] and for the other group we wrote and execute scripts one by one manually. After the database is prepared (all needed fields are updated, incremented), you’ll export the dump (but without drop statements), and import it in the target database.

Foreign checks

As I mentioned in the Part 1, we have products, orders and order_products (many to many). If you want to prepare ID values for migration, you’ll need to change (increment) product_id both in products and order_products table. By default, foreign checks are enabled, meaning you cannot simply change the ID field which is marked somewhere else as a foreign key. That’s why you need to tell your MySQL that you know what you’re doing.

SET foreign_key_checks = 0;

Maybe the best place for changing this is at the beginning (set to 0) and at the end (set this value to 1) of every script. Imagine, for example that you turn off this check on a test server, forget about it, and during the testing you write some scripts that are passing, but when you execute them on production server, they will fail, because the foreign check on the production server is turned on. That’s why I think it’s the best way to always turn the check back on after every script.

Last id as @last_id

If you have two databases, dbA and dbB, and you want to move products from dbB to dbA, first you need to find out what the latest ID is in the dbA, so you can freely move products from dbB after you increment ID values (find more details in Part 1). To get the last ID, you simply execute the following line on dbA:

SELECT max(product_id) FROM products p;

After this, I recommend you to write it somewhere and to assign this number to a variable.

SET @last_id = 100

Now, you need to update ID values in products, and order_products table.

UPDATE products SET product_id = product_id + @last_id;
UPDATE order_products SET product_id = product_id + @last_id;

Now you can see why we assigned this value to a variable @last_id. This example is simple, but imagine you have much more references (and much more UPDATE statements). It will be really borring to change the last ID value wherever it appears.

How to find all references?

Ok, I have to update ID values, but how to find all the places where that value appears? I’ll show you a way which can help you (but not completely, I’ll explain later why). Go into your database and check the information_schema. Execute the following:

SELECT * FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'your_db'
AND COLUMN_NAME = 'product_id';

If you want less details (only the table names where the specific columns appears) you can type this:

SELECT TABLE_NAME FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'your_db'
AND COLUMN_NAME = 'product_id';

Now the explanation why this won’t give you all needed results. For example, you have table users, and in your database you have seller_id. Seller is the user of course, but in this case, the column has name seller_id. Those are the cases where can’t find the previous query useful. For these cases you’ll need to investigate your database more, to see the foreign keys. Of course, knowing your application will help you here.

User migration

Ok, now is time for another example. This time it’s not simple updating. I’ll explain you a problem with user migration. There are users in dbA and users in dbB. Some of them exists only in dbB, and some of them exists in both databases. First group of users will have their ID values incremented, and for second group of users (same users in both databases) you’ll need to find the appropriate ID in dbA. Let’s say you have user Bruno with the ID 12 in dbA and same user in dbB has ID 23. You need to set user_id from 23 to 12 in users table (and in all tables with reference to user_id in users table).

To find out which users are same, which are different, you’ll need an extra step here. Usually, you would do a JOIN to see the same users, but in order to JOIN two user tables, you need to have them both in one schema. That means, you need to copy users table from from dbA and move it to dbB but rename it to temp_users. Use the MySQL Administrator tools for this. Extract only one table. After that, edit the file to rename the table name and import it in the dbB. Now you’ll have users and temp_users in dbB. The idea is to create a table containing old and new user IDs and that table will be used as a reference when updating user_id

-- say you have 20000 users in dbA
set @last_id = 20000;

drop table if exists old_new_users;
-- First group, users that will simple be moved (existing in dbB only)
create table old_new_users as select
user_id as 'old_id', user_id + @last_id as 'new_id'
from users u
-- right now we are comparing only usernames, you can add more parameters
-- if you want (first name, last name etc.)
where username not in (select username from temp_users);

-- Second group (same users in both databases)
insert into old_new_users (old_id, new_id)
select
u.user_id as 'old_id',
tmp.user_id as 'new_id'
from temp_users tmp
join users u on u.username = tmp.username;

After creating this old/new ID table, you can perform your updates.

update users set user_id = user_id + 100000;

update users u
join old_new_users onu on u.user_id = onu.old_id + 100000
set u.user_id = onu.new_id
where u.user_id = onu.old_id + 100000;

You are probably wondering what is this 100 000 doing here. I’ll explain it. During this user_id change you might end up with duplicate user_id values (because some are incremented, some are set according to dbA ID values). That’s why we first moved all users to a “safe zone”. We incremented all user ID values by 100 000 and then set their new values safely, one by one. That explains funny join statements and where clause too.

Unique keys

This adapting user_id values to the existing user_id in dbA can cause some unique key errors later when you try to import the dbB into dbA. You should consider removing those entries in some other tables, because you already have them in dbA for those users.

Auto increment

If you have some specific tables and you have to change ID values in dbA (note the dbA, not dbB where we usually changed ID values), you’ll want to check the auto increment value for that table.

Let’s say you have some table with specific dates, and for your application it is important to have those records in a specific order in your databes. For example, you have 100 records in dbA, 20 records in dbB and for some reasons, you need to put those 20 records starting with id 81 (instead of 101). In order to free some space, you’ll move last 20 records and you’ll have old records from 1-80, new records from 81-100 and old records again from 101-120. You need to update the auto increment value too if you want your next record to have ID 121 (otherwise it is left to be 101 and you’ll have an error next time you write something, because there is already a record with id 101).

Once again, this is only if you are doing some changes in the ID fields in dbA. All those tables in dbB (and auto increment values) will be taken care with exporting tool.

Creating dump

Finally, when all is prepared, you’ll want to export your data. We used MySQL Administrator for this. There is few things I want to mention here. If you are selecting only specific tables to be exported, then make sure that “Complete backup” is unchecked. If this is checked, it doesn’t matter which tables you selected.

Make sure to check all the options. You don’t want to include DROP statements or something like that.

After that, import all your data into the dbA.

mysql -u your_username -p dbA < dbB.sql

Epilogue

That would be all for now. I wrote about few specific situations I went through. If you have similar experiences I would like to hear your side of story.