Re: [HACKERS] database replication

From: Ryan Kirkpatrick <pgsql(at)rkirkpat(dot)net>
To: Damond Walker <dwalker(at)black-oak(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] database replication
Date: 1999-12-27 01:05:02
Message-ID: Pine.LNX.4.10.9912261742550.7666-100000@excelsior.rkirkpat.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 26 Dec 1999, Damond Walker wrote:

> > How about a single, seperate table with the fields of 'database',
> >'tablename', 'oid', 'last_changed', that would store the same data as your
> >PGR_TIME field. It would be seperated from the actually data tables, and
...
> The problem with OID's is that they are unique at the local level but if
> you try and use them between servers you can run into overlap.

Yea, forgot about that point, but became dead obvious once you
mentioned it. Boy, I feel stupid now. :)

> Using the PGR_TIME field with an index will allow us to find rows which
> have changed VERY quickly. All we need to do now is somehow programatically
> find the primary key for a table so the person setting up replication (or
> syncing) doesn't have to have an indepth knowledge of the schema in order to
> setup a syncing schedule.

Hmm... Yea, maybe look to see which field(s) has a primary, unique
index on it? Then use those field(s) as a primary key. Just require that
any table to be synchronized to have some set of fields that uniquely
identify each row. Either that, or add another field to each table with
our own, cross system consistent, identification system. Don't know which
would be more efficient and easier to work with.
The former could potentially get sticky if it takes a lots of
fields to generate a unique key value, but has the smallest effect on the
table to be synced. The latter could be difficult to keep straight between
systems (local vs. remote), and would require a trigger on inserts to
generate a new, unique id number, that does not exist locally or
remotely (nasty issue there), but would remove the uniqueness
requirement.

> Oops...how about defining a trigger for this? With deletion I guess we
> would have to move a flag into another table saying we deleted record 'X'
> with this primary key from this table.

Or, according to my logic below, if a row is missing on one side
or the other, then just compare the remaining row's timestamp to the last
synchronization time (stored in a seperate table/db elsewhere). The
results of the comparsion and the state of row existences tell one if the
row was inserted or deleted since the last sync, and what should be done
to perform the sync.

> > Yea, this is indeed the sticky part, and would indeed require some
> >fine-tunning. Basically, the way I see it, is if the two timestamps for a
> >single row do not match (or even if the row and therefore timestamp is
> >missing on one side or the other altogether):
> > local ts > remote ts => Local row is exported to remote.
> > remote ts > local ts => Remote row is exported to local.
> > local ts > last sync time && no remote ts =>
> > Local row is inserted on remote.
> > local ts < last sync time && no remote ts =>
> > Local row is deleted.
> > remote ts > last sync time && no local ts =>
> > Remote row is inserted on local.
> > remote ts < last sync time && no local ts =>
> > Remote row is deleted.
> >where the synchronization process is running on the local machine. By
> >exported, I mean the local values are sent to the remote machine, and the
> >row on that remote machine is updated to the local values. How does this
> >sound?

> Having said that, a good algo will have to be written to cut down on
> network traffic and to keep database conversations down to a minimum. This
> will be appreciated by people with low bandwidth connections I'm sure
> (dial-ups, fractional T1's, etc).

Of course! In reflection, the assigned identification number I
mentioned above might be the best then, instead of having to transfer the
entire set of key fields back and forth.

> What would a vacuum do to a system being used by many people?

Probably lock them out of tables while they are vacuumed... Maybe
not really required in the end, possibly optional?

> It could probably be named either way...but the one thing I really don't
> want to do is start hacking server code. The PostgreSQL people have enough
> to do without worrying about trying to meld anything I've done to their
> server. :)

Yea, they probably would appreciate that. They already have enough
on thier plate for 7.x as it is! :)

> Besides, I like the idea of having it operate as a stand-alone product.
> The only PostgreSQL feature we would require would be triggers and
> plpgsql...what was the earliest version of PostgreSQL that supported
> plpgsql? Even then I don't see the triggers being that complex to boot.

No, provided that we don't do the identification number idea
(which the more I think about it, probably will not work). As for what
version support plpgsql, I don't know, one of the more hard-core pgsql
hackers can probably tell us that.

> The only thing we'd need for Python is the Python extensions for
> PostgreSQL...which in turn requires libpq and that's about it. So, it
> should be able to run on any platform supported by Python and libpq.

Of course. If it ran on NT as well as Linux/Unix, that would be
even better. :)

> Unix folks should be happy....assuming they have X running on the
> machine doing the replication or syncing. Even then I wrote a curses
> based Python interface awhile back which allows buttons, progress
> bars, input fields, etc (I called it tinter and it's available at
> http://iximd.com/~dwalker). It's a simple interface and could
> probably be cleaned up a bit but it works. :)

Why would we want any type of GUI (X11 or curses) for this sync
program. I imagine just a command line program with a few options (local
machine, remote machine, db name, etc...), and nothing else.
Though I will take a look at your curses interface, as I have been
wanting to make a curses interface to a few db interfaces I have, in a
simple as manner as possible.

> That would be a Good Thing. Have webspace somewhere? If I can get
> permission from the "powers that be" at the office I could host a website on
> our (Domino) webserver.

Yea, I got my own web server (www.rkirkpat.net) with 1GB+ of disk
space available, sitting on a decent speed DSL. Even can setup of a
virtual server if we want (i.e. pgsync.rkirkpat.net :). CVS repository,
email lists, etc... possible with some effort (and time).
So, where should we start? TTYL.

PS. The current pages on my web site are very out of date at the
moment (save for the pgsql information). I hope to have updated ones up
within the week.

---------------------------------------------------------------------------
| "For to me to live is Christ, and to die is gain." |
| --- Philippians 1:21 (KJV) |
---------------------------------------------------------------------------
| Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ |
---------------------------------------------------------------------------

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-12-27 02:19:48 Re: [HACKERS] Error "vacuum pg_proc"
Previous Message Tom Lane 1999-12-26 21:20:49 Re: [HACKERS] Error "vacuum pg_proc"