Re: Postgres-R: tuple serialization

From: Decibel! <decibel(at)decibel(dot)org>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Postgres-R: tuple serialization
Date: 2008-07-22 21:32:36
Message-ID: D697DBAF-D495-4602-A1EF-E013DB804B0F@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 22, 2008, at 3:04 AM, Markus Wanner wrote:
> yesterday, I promised to outline the requirements of Postgres-R for
> tuple serialization, which we have been talking about before. There
> are basically three types of how to serialize tuple changes,
> depending on whether they originate from an INSERT, UPDATE or
> DELETE. For updates and deletes, it saves the old pkey as well as
> the origin (a global transaction id) of the tuple (required for
> consistent serialization on remote nodes). For inserts and updates,
> all added or changed attributes need to be serialized as well.
>
> pkey+origin changes
> INSERT - x
> UPDATE x x
> DELETE x -
>
> Note, that the pkey attributes may never be null, so an isnull bit
> field can be skipped for those attributes. For the insert case, all
> attributes (including primary key attributes) are serialized.
> Updates require an additional bit field (well, I'm using chars ATM)
> to store which attributes have changed. Only those should be
> transferred.
>
> I'm tempted to unify that, so that inserts are serialized as the
> difference against the default vaules or NULL. That would make
> things easier for Postgres-R. However, how about other uses of such
> a fast tuple applicator? Does such a use case exist at all? I mean,
> for parallelizing COPY FROM STDIN, one certainly doesn't want to
> serialize all input tuples into that format before feeding multiple
> helper backends. Instead, I'd recommend letting the helper backends
> do the parsing and therefore parallelize that as well.
>
> For other features, like parallel pg_dump or even parallel query
> execution, this tuple serialization code doesn't help much, IMO. So
> I'm thinking that optimizing it for Postgres-R's internal use is
> the best way to go.
>
> Comments? Opinions?

ISTM that both londiste and Slony would be able to make use of these
improvements as well. A modular replication system should be able to
use a variety of methods for logging data changes and then applying
them on a subscriber, so long as some kind of common transport can be
agreed upon (such as text). So having a change capture and apply
mechanism that isn't dependent on a lot of extra stuff would be
generally useful to any replication mechanism.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2008-07-22 21:35:01 Re: Transaction-controlled robustness for replication
Previous Message Tom Lane 2008-07-22 21:24:00 Re: Do we really want to migrate plproxy and citext into PG core distribution?