Re: GDQ iimplementation (was: Re: Clustering features for upcoming developer meeting -- please claim yours!)

From: Marko Kreen <markokr(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Jan Wieck <JanWieck(at)yahoo(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-cluster-hackers(at)postgresql(dot)org
Subject: Re: GDQ iimplementation (was: Re: Clustering features for upcoming developer meeting -- please claim yours!)
Date: 2010-05-11 14:03:40
Message-ID: AANLkTilYfuwLplIc8nWqBrj_suUoxYOtwPkcZHNUZUEO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-cluster-hackers

On 5/11/10, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Tue, 2010-05-11 at 08:33 -0400, Jan Wieck wrote:
> > What are the advantages of anything proposed over the current
> > implementations used by Londiste and Slony?
>
> It would be good to have a core technology that provided a generic
> transport to other remote databases.

I suspect there still should be some sort of middle-ware code
that reads the data from Postgres, and writes to other db.

So the task of the GDQ should be to make data available to that
reader, not be "transport to remote databases", no?

And if we are talking about "Generalized Data Queue", one important
aspect is that it should be easy to write both queue readers and
writers in whatever language users wants. Which means it should be
possible to do both reading and writing with ordinary SQL queries.
Even requiring COPY is out, as it not available in many adapters.

Ofcourse it's OK to have such extensions available optionally.
Eg. Londiste does event bulk insert with COPY. But it's not required
for ordinary clients. And you can always turn SELECT output into
COPY format.

> We already have WALSender and WALReceiver, which uses the COPY protocol
> as a transport mechanism. It would be easy to extend that so we could
> send other forms of data.
>
> We can do that in two ways:
>
> * Alter triggers so that Slony/Londiste write directly to WAL rather
> than log tables, using a new WAL record for custom data blobs.
>
> * Alter WALSender so it can read Slony/Londiste log tables for
> consumption by an architecture similar to WALreceiver/Startup. Probably
> easier.
>
> We can also alter the WAL format itself to include the information in
> WAL that is required to do what Slony/Londiste already do, so we don't
> need to specifically write anything at all, just read WAL at other end.
> Even more efficient.

Hm. You'd need to tie WAL rotating with reader positions.

And to read a largish batch from WAL, you need to process also
unrelated data?

Reading from WAL is OK for full replication, but for smallish
queue, that gets only small percentage of overall traffic?

> The advantages of these options would be
>
> * integration of core technologies
> * greater efficiency for trigger based logging via WAL
>
> In other RDBMS "replication" has long meant "data transport, either for
> HA or application use". We should be looking beyond the pure HA aspects,
> as pgq does.
>
> I would certainly like to see a system that wrote data on master and
> then constructed the SQL on receiver-side (i.e. on slave), so the
> integration was less tight. That would allow data to be sent and for it
> to be consumed to a variety of purposes, not just HA replay.

pgq.logutriga()? It writes all columns, also NULL-ed into queue
in urlencoded format. Londiste actually even knows how to generate
SQL from those.

We use it for most non-replication queues, where we want process
the event more intelligently that simply executing it on
some other connection.

--
marko

In response to

Responses

Browse pgsql-cluster-hackers by date

  From Date Subject
Next Message Jan Wieck 2010-05-11 14:06:35 Re: GDQ iimplementation
Previous Message Marko Kreen 2010-05-11 13:36:26 Re: GDQ iimplementation (was: Re: Clustering features for upcoming developer meeting -- please claim yours!)