Oh boi, I’m restoring this page since it still has traffic. It will be published as a page, not as a blogpost because it’s not inline anymore with the blog topic.
My name showed up on The University of Liverpool lecture slides and this is the story how I found about it.
However, I’m receiving daily request for this link (mainly on part 1), so here we go! This is the first draft, I’ll try to re-style it as it was before.
Original publishing date: December 16, 2012
Okay, here I will continue with my story about the database merging. In the second part, I’ll focus more on the technical stuff, how it is realized, how to get some values, how to execute some steps etc.
After the discussion about the potential ways how to execute this migration, we agreed to write MySQL scripts and execute them one by one. Main focus was on the entity ID fields. As I wrote in the Part 1, we analyzed two specific groups of entities, one with common rules and the other with their own rules. For the first group we used Carbon 5 maven plugin [archived version of that page] and for the other group we wrote and execute scripts one by one manually. After the database is prepared (all needed fields are updated, incremented), you’ll export the dump (but without drop statements), and import it in the target database.
As I mentioned in the Part 1, we have
order_products (many to many). If you want to prepare
ID values for migration, you’ll need to change (increment)
product_id both in
order_products table. By default, foreign checks are enabled, meaning you cannot simply change the ID field which is marked somewhere else as a foreign key. That’s why you need to tell your MySQL that you know what you’re doing.
SET foreign_key_checks = 0;
Maybe the best place for changing this is at the beginning (set to 0) and at the end (set this value to 1) of every script. Imagine, for example that you turn off this check on a test server, forget about it, and during the testing you write some scripts that are passing, but when you execute them on production server, they will fail, because the foreign check on the production server is turned on. That’s why I think it’s the best way to always turn the check back on after every script.
Last id as @last_id
If you have two databases,
dbB, and you want to move products from
dbA, first you need to find out what the latest
ID is in the
dbA, so you can freely move products from
dbB after you increment
ID values (find more details in Part 1). To get the last
ID, you simply execute the following line on
SELECT max(product_id) FROM products p;
After this, I recommend you to write it somewhere and to assign this number to a variable.
SET @last_id = 100
Now, you need to update
ID values in products, and
UPDATE products SET product_id = product_id + @last_id; UPDATE order_products SET product_id = product_id + @last_id;
Now you can see why we assigned this value to a variable
@last_id. This example is simple, but imagine you have much more references (and much more UPDATE statements). It will be really borring to change the last ID value wherever it appears.
How to find all references?
Ok, I have to update
ID values, but how to find all the places where that value appears? I’ll show you a way which can help you (but not completely, I’ll explain later why). Go into your database and check the
information_schema. Execute the following:
SELECT * FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = 'your_db' AND COLUMN_NAME = 'product_id';
If you want less details (only the table names where the specific columns appears) you can type this:
SELECT TABLE_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = 'your_db' AND COLUMN_NAME = 'product_id';
Now the explanation why this won’t give you all needed results. For example, you have table users, and in your database you have
seller_id. Seller is the user of course, but in this case, the column has name
seller_id. Those are the cases where can’t find the previous query useful. For these cases you’ll need to investigate your database more, to see the foreign keys. Of course, knowing your application will help you here.
Ok, now is time for another example. This time it’s not simple updating. I’ll explain you a problem with user migration. There are users in
dbA and users in
dbB. Some of them exists only in
dbB, and some of them exists in both databases. First group of users will have their
ID values incremented, and for second group of users (same users in both databases) you’ll need to find the appropriate
dbA. Let’s say you have user Bruno with the
ID 12 in
dbA and same user in
dbB has ID 23. You need to set user_id from 23 to 12 in
users table (and in all tables with reference to
To find out which
users are same, which are different, you’ll need an extra step here. Usually, you would do a JOIN to see the same users, but in order to JOIN two user tables, you need to have them both in one schema. That means, you need to copy users table from from
dbA and move it to
dbB but rename it to
temp_users. Use the MySQL Administrator tools for this. Extract only one table. After that, edit the file to rename the table name and import it in the
dbB. Now you’ll have users and
dbB. The idea is to create a table containing old and new user IDs and that table will be used as a reference when updating
-- say you have 20000 users in dbA set @last_id = 20000; drop table if exists old_new_users; -- First group, users that will simple be moved (existing in dbB only) create table old_new_users as select user_id as 'old_id', user_id + @last_id as 'new_id' from users u -- right now we are comparing only usernames, you can add more parameters -- if you want (first name, last name etc.) where username not in (select username from temp_users); -- Second group (same users in both databases) insert into old_new_users (old_id, new_id) select u.user_id as 'old_id', tmp.user_id as 'new_id' from temp_users tmp join users u on u.username = tmp.username;
After creating this old/new ID table, you can perform your updates.
update users set user_id = user_id + 100000; update users u join old_new_users onu on u.user_id = onu.old_id + 100000 set u.user_id = onu.new_id where u.user_id = onu.old_id + 100000;
You are probably wondering what is this 100 000 doing here. I’ll explain it. During this
user_id change you might end up with duplicate user_id values (because some are incremented, some are set according to
dbA ID values). That’s why we first moved all users to a “safe zone”. We incremented all user
ID values by 100 000 and then set their new values safely, one by one. That explains funny join statements and where clause too.
user_id values to the existing
dbA can cause some unique key errors later when you try to import the
dbA. You should consider removing those entries in some other tables, because you already have them in
dbA for those users.
If you have some specific tables and you have to change
ID values in
dbA (note the
dbB where we usually changed
ID values), you’ll want to check the auto increment value for that table.
Let’s say you have some table with specific dates, and for your application it is important to have those records in a specific order in your databes. For example, you have 100 records in
dbA, 20 records in
dbB and for some reasons, you need to put those 20 records starting with id 81 (instead of 101). In order to free some space, you’ll move last 20 records and you’ll have old records from 1-80, new records from 81-100 and old records again from 101-120. You need to update the auto increment value too if you want your next record to have ID 121 (otherwise it is left to be 101 and you’ll have an error next time you write something, because there is already a record with id 101).
Once again, this is only if you are doing some changes in the
ID fields in
dbA. All those tables in
dbB (and auto increment values) will be taken care with exporting tool.
Finally, when all is prepared, you’ll want to export your data. We used MySQL Administrator for this. There is few things I want to mention here. If you are selecting only specific tables to be exported, then make sure that “Complete backup” is unchecked. If this is checked, it doesn’t matter which tables you selected.
Make sure to check all the options. You don’t want to include DROP statements or something like that.
After that, import all your data into the
mysql -u your_username -p dbA < dbB.sql
That would be all for now. I wrote about few specific situations I went through. If you have similar experiences I would like to hear your side of story.