Re: Database design: Data synchronization

From: Decibel! <decibel(at)decibel(dot)org>
To: David <wizzardx(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Database design: Data synchronization
Date: 2008-06-19 15:54:31
Message-ID: D929C937-343F-416A-87BF-959EC730BEED@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Jun 18, 2008, at 7:07 AM, David wrote:
> - Many foreign keys weren't enforced
>
> - Some fields needed special treatment (eg: should be unique, or
> behave like a foreign key ref, even if db schema doesn't specify it.
> In other cases they need to be updated during the migration).
>
> - Most auto-incrementing primary keys (and related foreign key
> references) needed to be updated during migration, because they are
> already used in the destination database for other records.
>
> - Many tables are undocumented, some fields have an unknown purpose
>
> - Some tables didn't have fields that can be used as a 'natural' key
> for the purpose of migration (eg: tables which only exist to link
> together other tables, or tables where there are duplicate records).
>
> I wrote a Python script (using SQLAlchemy and Elixir) to do the above
> for our databases.
>
> Are there any existing migration tools which could have helped with
> the above? (it would have required a *lot* of user help).
>
> Are there recommended ways of designing tables so that synchronization
> is easier?
>
> The main thing I've read about is ensuring that all records have a
> natural key of some kind, eg GUID. Also, your migration app needs to
> have rules for conflict resolution.

Well, it sounds like you've got a good list of what NOT to do. The
first step is to make sure that you have a good database design,
outside of replication considerations. Most tables should have
natural unique keys; make sure you have FK's, documment things (see
the COMMENT ON command), etc. If you have low data quality to start
with, spreading that all over is just going to make things worse.

For the actual replication, there isn't really a multi-master
solution for Postgres. Your best bet is to try and design the system
so that you don't have conflicts (ie: if you have a bunch of branch
offices, each one is responsible for their own data). You can then
build something akin to multi-master using londiste and pgq.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2008-06-19 15:55:40 Re: Losing data
Previous Message Alvaro Herrera 2008-06-19 15:18:42 Re: Logging Parameter Values