Re: syncing - between databases

From: Steven Crandell <steven(dot)crandell(at)gmail(dot)com>
To: John Fabiani <johnf(at)jfcomputer(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: syncing - between databases
Date: 2012-05-14 05:32:18
Message-ID: CALvesgkzc1a0kkJUeyAJxxdywafjjkLAHsbUeapQ8fSCrYauKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Having all the dblink destinations on the same server makes it a much more
viable option since the chances of the dblink update failing are greatly
reduced. That said, here's a run down on the queue system I described with
some specificity toward your situation.

You can add a flag to each customer row that denotes whether or not the
data has been synced to the other tables,

ALTER TABLE customer ADD synced BOOLEAN DEFAULT FALSE; --watch out for how
this default might affect existing customers

and then have your sync process flip the flag to true when it has safely
written the data to all other tables.
Alternatively, you can store the data that needs to be sync'd in a separate
table if making changes to the customer table isn't a viable option.

CREATE TABLE customer_queue (LIKE customer);
ALTER TABLE customer_queue ADD created_time TIMESTAMP DEFAULT now();
ALTER TABLE customer_queue ADD operation CHAR(1); -- 'i' = insert, 'u' ==
update, etc
ALTER TABLE customer_queue ADD processed_time TIMESTAMP;
ALTER TABLE customer_queue ADD processed BOOLEAN DEFAULT FALSE;
......or something similar

If patching the application to write new/updated customer data to the
customer_queue table (in addition to or instead of the customer table) is
out of scope, you could populate it via trigger.

Once you have a data structure that stores your customer data and the meta
data which captures whether the row has been safely synced out to the other
tables it's just a matter of writing a script that reads your sync meta
data (queue table or your customer table where not processed) and processes
all rows that are pending.
SELECT foo,bar,baz FROM customer_queue WHERE NOT processed ORDER BY
created_time; -- FIFO
The script should be able to verify that a given row was safely written to
all destinations before setting processed to true.

Anyway, that's one of many ways to accomplish this and it's surely far from
the best but I hope this is helpful.

regards
-steve

On Sun, May 13, 2012 at 1:01 PM, John Fabiani <johnf(at)jfcomputer(dot)com> wrote:

> All on the same cluster (only one server). Although, it would be nice to
> have only one table there are real business reasons to dup the databases.
>
> I am interested in how you dealt with a queue table. Would you take a
> little
> time to describe the way it worked.
>
> Johnf
>
> On Saturday, May 12, 2012 08:53:52 PM you wrote:
> > Are these 5 databases on different servers and at different locations or
> > are they on the same local cluster?
> > If they are all on the same local cluster you may want to rethink how you
> > are storing customer data. The design you describe seems redundant.
> >
> > If you are dealing with multiple servers (and perhaps business rules that
> > require duplicate, writable user tables at each location?) then your plan
> > needs to account for network failure. A synchronous cross-network dblink
> > trigger mechanism left to its own devices will eventually fail and you
> will
> > be left with inconsistent data. Nothing wrong with dblink but you need
> to
> > build in some error handling.
> >
> > I've built systems that accomplished similar things by writing data to a
> > queue table (in addition to your local master customer table) which is
> then
> > reconciled/synced out to other nodes or process by an periodic script
> that
> > is able to deal with or alert on locking/dupe key/network and other
> errors
> > that keep it from properly syncing a row to all other nodes. This
> > introduces added durability to your sync mechanism but also introduces
> some
> > lag time. Pick your poison.
> >
> > -steve
> >
> > On Sat, May 12, 2012 at 7:28 AM, John Fabiani <johnf(at)jfcomputer(dot)com>
> wrote:
> > > I need to maintain a sync-ed table across several databases. For
> > > example I have a customer table in 5 databases. If a user of any of
> > > the databases inserts a new customer I need to insert the new record
> > > into the other four databases. But question is updates and deletes.
> > >
> > > I can use a trigger and dblink to update the other databases when the
> > > action
> > > is an insert because in each of the other databases I don't have to
> > > worry
> > > about a locked record. But what happens if a user is updating at the
> > > same moment as a different user in a different database is updating the
> > > same customer. Can a race condition occur?
> > >
> > > I was thinking I could create a master database. And have all the
> other
> > > databases use dblink to excute the master trigger.
> > >
> > >
> > > Any advise would be helpful,
> > >
> > > Johnf
> > >
> > > --
> > > Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
> > > To make changes to your subscription:
> > > http://www.postgresql.org/mailpref/pgsql-sql
>

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Samuel Gendler 2012-05-14 05:45:10 Re: Finding Max Value in a Row
Previous Message Tom Lane 2012-05-14 03:11:43 Re: Finding Max Value in a Row