Re: Clustering features for upcoming developer meeting -- please claim yours!

From: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-cluster-hackers(at)postgresql(dot)org
Subject: Re: Clustering features for upcoming developer meeting -- please claim yours!
Date: 2010-05-10 22:40:21
Message-ID: 1273531221.15431.60.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-cluster-hackers

On Mon, 2010-05-10 at 17:04 -0400, Jan Wieck wrote:
> On 5/10/2010 4:25 PM, Marko Kreen wrote:
> > AFAICS the "agreeable order" should take care of positioning:
> >
> > http://wiki.postgresql.org/wiki/ModificationTriggerGDQ#Suggestions_for_Implementation
> >
> > This combined with DML triggers that react to invalidate events (like
> > PgQ ones) should already work fine?
> >
> > Are there situations where such setup fails?
> >
>
> That explanation of an agreeable order only solves the problems of
> placing the DDL into the replication stream between transactions,
> possibly done by multiple clients.

Why only "between transactions" (whatever that means) ?

If all transactions get their event ids from the same non-cached
sequence, then the event id _is_ a reliable ordering within a set of
concurrent transactions.

Event id's get serialized (wher it matters) by the very locks taken by
DDL/DML statments on the objects they manipulate.

Once more, for this to work over more than one backend, the sequence
providing the event id's needs to be non-cached.

> It does in no way address the problem of one single client executing a
> couple of updates, modifies the object, then continues with updates. In
> this case, there isn't even a transaction boundary at which the DDL
> happened on the master. And this one transaction could indeed alter the
> object several times.

How is DDL here different from DML herev?

You need to replay DML in the right order too, no ?

> This means that a generalized data queue needs to have hooks, so that
> DDL triggers can inject their payload into it.

Anything that needs to be replicated, needs "to have hooks" in the
generalized data queue, so that

a) they get replicated in the right order for each affected object
a.1) this can be relaxed for related objects in case FK-s are
disabled of deferred until transaction end
b) they get committed on the subscriber side at transaction (set)
boundaries of provider.

if you implement the data queue as something non-transactional (non
pgQ-like), then you need to replicate (i,e. copy over and replay

c) events from both committed and rollbacked transaction
d) commits/rollbacks themselves
e) and apply and/or rollback each individual transaction separately

IOW you mostly re-implement WAL, except at logical level. Which may or
may not be a good thing, depending on other requirements of the system.

If you do it using pgQ you save on not copying rollbacked data, but you
do slightly more work on provider side. You also end up with not having
dead tuples from aborted transactions on subscriber.

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

In response to

Responses

Browse pgsql-cluster-hackers by date

  From Date Subject
Next Message Hannu Krosing 2010-05-10 22:48:13 Re: Clustering features for upcoming developer meeting -- please claim yours!
Previous Message Marko Kreen 2010-05-10 22:08:34 Re: Clustering features for upcoming developer meeting -- please claim yours!