Re: syncing - between databases

From: Trinath Somanchi <trinath(dot)somanchi(at)gmail(dot)com>
To: Steven Crandell <steven(dot)crandell(at)gmail(dot)com>
Cc: John Fabiani <johnf(at)jfcomputer(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: syncing - between databases
Date: 2012-05-14 11:52:12
Message-ID: CAH8HgmYcTg+pNyYt7BNyS-YJApY1TOh3fY9u_+4TGH5RW0ke3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi-

You can use PgCluster with Slony-II for this type of requirements.

On Mon, May 14, 2012 at 11:02 AM, Steven Crandell <steven(dot)crandell(at)gmail(dot)com
> wrote:

> Having all the dblink destinations on the same server makes it a much more
> viable option since the chances of the dblink update failing are greatly
> reduced. That said, here's a run down on the queue system I described with
> some specificity toward your situation.
>
> You can add a flag to each customer row that denotes whether or not the
> data has been synced to the other tables,
>
> ALTER TABLE customer ADD synced BOOLEAN DEFAULT FALSE; --watch out for
> how this default might affect existing customers
>
> and then have your sync process flip the flag to true when it has safely
> written the data to all other tables.
> Alternatively, you can store the data that needs to be sync'd in a
> separate table if making changes to the customer table isn't a viable
> option.
>
> CREATE TABLE customer_queue (LIKE customer);
> ALTER TABLE customer_queue ADD created_time TIMESTAMP DEFAULT now();
> ALTER TABLE customer_queue ADD operation CHAR(1); -- 'i' = insert, 'u' ==
> update, etc
> ALTER TABLE customer_queue ADD processed_time TIMESTAMP;
> ALTER TABLE customer_queue ADD processed BOOLEAN DEFAULT FALSE;
> ......or something similar
>
> If patching the application to write new/updated customer data to the
> customer_queue table (in addition to or instead of the customer table) is
> out of scope, you could populate it via trigger.
>
> Once you have a data structure that stores your customer data and the meta
> data which captures whether the row has been safely synced out to the other
> tables it's just a matter of writing a script that reads your sync meta
> data (queue table or your customer table where not processed) and processes
> all rows that are pending.
> SELECT foo,bar,baz FROM customer_queue WHERE NOT processed ORDER BY
> created_time; -- FIFO
> The script should be able to verify that a given row was safely written to
> all destinations before setting processed to true.
>
> Anyway, that's one of many ways to accomplish this and it's surely far
> from the best but I hope this is helpful.
>
> regards
> -steve
>
> On Sun, May 13, 2012 at 1:01 PM, John Fabiani <johnf(at)jfcomputer(dot)com>wrote:
>
>> All on the same cluster (only one server). Although, it would be nice to
>> have only one table there are real business reasons to dup the databases.
>>
>> I am interested in how you dealt with a queue table. Would you take a
>> little
>> time to describe the way it worked.
>>
>> Johnf
>>
>> On Saturday, May 12, 2012 08:53:52 PM you wrote:
>> > Are these 5 databases on different servers and at different locations or
>> > are they on the same local cluster?
>> > If they are all on the same local cluster you may want to rethink how
>> you
>> > are storing customer data. The design you describe seems redundant.
>> >
>> > If you are dealing with multiple servers (and perhaps business rules
>> that
>> > require duplicate, writable user tables at each location?) then your
>> plan
>> > needs to account for network failure. A synchronous cross-network
>> dblink
>> > trigger mechanism left to its own devices will eventually fail and you
>> will
>> > be left with inconsistent data. Nothing wrong with dblink but you need
>> to
>> > build in some error handling.
>> >
>> > I've built systems that accomplished similar things by writing data to a
>> > queue table (in addition to your local master customer table) which is
>> then
>> > reconciled/synced out to other nodes or process by an periodic script
>> that
>> > is able to deal with or alert on locking/dupe key/network and other
>> errors
>> > that keep it from properly syncing a row to all other nodes. This
>> > introduces added durability to your sync mechanism but also introduces
>> some
>> > lag time. Pick your poison.
>> >
>> > -steve
>> >
>> > On Sat, May 12, 2012 at 7:28 AM, John Fabiani <johnf(at)jfcomputer(dot)com>
>> wrote:
>> > > I need to maintain a sync-ed table across several databases. For
>> > > example I have a customer table in 5 databases. If a user of any of
>> > > the databases inserts a new customer I need to insert the new record
>> > > into the other four databases. But question is updates and deletes.
>> > >
>> > > I can use a trigger and dblink to update the other databases when the
>> > > action
>> > > is an insert because in each of the other databases I don't have to
>> > > worry
>> > > about a locked record. But what happens if a user is updating at the
>> > > same moment as a different user in a different database is updating
>> the
>> > > same customer. Can a race condition occur?
>> > >
>> > > I was thinking I could create a master database. And have all the
>> other
>> > > databases use dblink to excute the master trigger.
>> > >
>> > >
>> > > Any advise would be helpful,
>> > >
>> > > Johnf
>> > >
>> > > --
>> > > Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
>> > > To make changes to your subscription:
>> > > http://www.postgresql.org/mailpref/pgsql-sql
>>
>
>

--
Regards,
----------------------------------------------
Trinath Somanchi,
+91 9866 235 130

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Devrim GÜNDÜZ 2012-05-14 11:53:48 Re: syncing - between databases
Previous Message Samuel Gendler 2012-05-14 05:45:10 Re: Finding Max Value in a Row