Re: Follow-up on replication hooks for PostgreSQL

From: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Follow-up on replication hooks for PostgreSQL
Date: 2008-07-10 14:44:20
Message-ID: C49B6E54.C4AD%robert.hodges@continuent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers

Hi Marko,

No fear, we definitely will discuss on pgsql-hackers. I just wanted to make sure that people understood we are still committed to solving this problem and will one way or another commit resources to help.

Just to be clear, by logical replication I mean replication based on sending SQL or near-SQL (e.g., generic DML events) between servers. Physical replication on the other hand uses internal formats to replicate changes without intervening conversion to SQL, for example by shipping WAL records. There are advantages to each for different applications. BTW, I heard this nomenclature from Simon Riggs. It seems quite helpful.

The DDL trigger proposal is interesting and would be a very useful feature addition to PostgreSQL. To execute correctly it may also be necessary to know which database you were using at the time the SQL was issued.

For our part we are looking for ways to replicate most or all data on a server as efficiently as possible. Generic call-outs at commit time or reading the log directly are attractive approaches. Depending on the implementation you can avoid double writes of replicated data on the master host. Also, it avoids the management headache of ensuring that triggers are correctly installed. It seems as if one of these generic approaches could hook into WAL record transport.

Cheers, Robert

On 7/10/08 4:56 AM, "Marko Kreen" <markokr(at)gmail(dot)com> wrote:

On 7/10/08, Robert Hodges <robert(dot)hodges(at)continuent(dot)com> wrote:
> This is a quick update on a promise I made early in June to suggest
> requirements as well as ways to add replication hooks that would support
> logical replication, as opposed to the physical replication work currently
> underway based on NTT's code.
>
> Well, June was a pretty busy month, so it has taken a while to get back to
> this. However, we are now beginning to examine options for PostgreSQL
> logical replication. To make a long story short we are willing to commit
> resources to this problem or fund other people to do it for us. If you are
> interested please contact me directly. Meanwhile, we are quite serious
> about this problem and intend to work on helpful additions to PostgreSQL in
> this area. I will post more as we make progress.

Well, I'm not exactly sure what you are planning. It's OK to do
draft design privately, but before actually starting coding, the
design should be discussed in -hackers.

And I'm not exactly sure what you mean in "logical replication"?
Way to log DDL statements? Do you want to log DML also?

FWIW, here's very draft design for functionality that could be used
to make current Slony-I/Londiste-like solutions to replicate DDL also.

1. CREATE DDL TRIGGER statement that allows to call function
for all DDL statements.

Only filtering that makes sense here is filtering by area:
tables/functions/views/etc.

It must be possible to do AFTER trigger. Whether BEFORE
trigger for DDL make sense or not, I'm not sure.

2. When function is called, following information is given:

- Object type the event was for (table/view/function)
- Array of object names.
- SQL statement as text.

The trigger function can filter further based on object names
whether it does want to log the event or not.

Trying to make the trigger run on only subset of events is complex,
and parsing the SQL to pieces for trigger to understand it better
is also complex and neither is needed. Unless there are some common
situation where such simple design fails to work, I would not make
the scheme more complex.

Also the design should be based on assumption that the target side
is exactly in sync. Eg. DROP CASCADE should be replicated as DROP CASCADE.
We should not make scheme more complex to survive cases where target
is not in sync. That way madness lies. The effect should be like
same SQL statements are applied to target by hand, no more, no less.

--
marko

--
Robert Hodges, CTO, Continuent, Inc.
Email: robert(dot)hodges(at)continuent(dot)com
Mobile: +1-510-501-3728 Skype: hodgesrm

In response to

Browse pgsql-advocacy by date

  From Date Subject
Next Message Gabriele Bartolini 2008-07-11 11:53:16 European PostgreSQL Day 2008
Previous Message Marko Kreen 2008-07-10 11:56:31 Re: Follow-up on replication hooks for PostgreSQL

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-07-10 14:52:29 Re: CommitFest rules
Previous Message Kless 2008-07-10 14:37:13 UUID - Data type inefficient