Обсуждение: Database design: Data synchronization

Поиск
Список
Период
Сортировка

Database design: Data synchronization

От
David
Дата:
Hi list.

2 cases I'm interested in:

1) Migrating data from one database to another

2) Distributing data over many databases, and later merging

In what ways can you design tables to easier facilitate the above cases?

I am aware of multi-master replication software, as described here:

http://en.wikipedia.org/wiki/Multi-master_replication

For this question, I'm more interested in schema design, so that a
home-brewed database synchronization can perform synchronization.

I have some experience with this. One of my previous projects was to
reliably migrate data from one database to another, where the 2
databases had problems like:

- Many foreign keys weren't enforced

- Some fields needed special treatment (eg: should be unique, or
behave like a foreign key ref, even if db schema doesn't specify it.
In other cases they need to be updated during the migration).

- Most auto-incrementing primary keys (and related foreign key
references) needed to be updated during migration, because they are
already used in the destination database for other records.

- Many tables are undocumented, some fields have an unknown purpose

- Some tables didn't have fields that can be used as a 'natural' key
for the purpose of migration (eg: tables which only exist to link
together other tables, or tables where there are duplicate records).

I wrote a Python script (using SQLAlchemy and Elixir) to do the above
for our databases.

Are there any existing migration tools which could have helped with
the above? (it would have required a *lot* of user help).

Are there recommended ways of designing tables so that synchronization
is easier?

The main thing I've read about is ensuring that all records have a
natural key of some kind, eg GUID. Also, your migration app needs to
have rules for conflict resolution.

David.

Re: Database design: Data synchronization

От
Decibel!
Дата:
On Jun 18, 2008, at 7:07 AM, David wrote:
> - Many foreign keys weren't enforced
>
> - Some fields needed special treatment (eg: should be unique, or
> behave like a foreign key ref, even if db schema doesn't specify it.
> In other cases they need to be updated during the migration).
>
> - Most auto-incrementing primary keys (and related foreign key
> references) needed to be updated during migration, because they are
> already used in the destination database for other records.
>
> - Many tables are undocumented, some fields have an unknown purpose
>
> - Some tables didn't have fields that can be used as a 'natural' key
> for the purpose of migration (eg: tables which only exist to link
> together other tables, or tables where there are duplicate records).
>
> I wrote a Python script (using SQLAlchemy and Elixir) to do the above
> for our databases.
>
> Are there any existing migration tools which could have helped with
> the above? (it would have required a *lot* of user help).
>
> Are there recommended ways of designing tables so that synchronization
> is easier?
>
> The main thing I've read about is ensuring that all records have a
> natural key of some kind, eg GUID. Also, your migration app needs to
> have rules for conflict resolution.


Well, it sounds like you've got a good list of what NOT to do. The
first step is to make sure that you have a good database design,
outside of replication considerations. Most tables should have
natural unique keys; make sure you have FK's, documment things (see
the COMMENT ON command), etc. If you have low data quality to start
with, spreading that all over is just going to make things worse.

For the actual replication, there isn't really a multi-master
solution for Postgres. Your best bet is to try and design the system
so that you don't have conflicts (ie: if you have a bunch of branch
offices, each one is responsible for their own data). You can then
build something akin to multi-master using londiste and pgq.
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Вложения