Re: Re: xReader, double-effort (was: Temporary tables under hot standby)

From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Andres Freund <andres(at)2ndquadrant(dot)com>, Aakash Goel <aakash(dot)bits(at)gmail(dot)com>, Josh Berkus <josh(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
Date: 2012-04-29 22:00:14
Message-ID: 1335736814.3919.92.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 2012-04-29 at 16:33 -0400, Robert Haas wrote:
> On Sat, Apr 28, 2012 at 11:06 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> >> Translating WAL is a very hard task.
> >
> > No kidding. I would think it's impossible on its face. Just for
> > starters, where will you get table and column names from? (Looking at
> > the system catalogs is cheating, and will not work reliably anyway.)
> >
> > IMO, if we want non-physical replication, we're going to need to build
> > it in at a higher level than after-the-fact processing of WAL.
> > I foresee wasting quite a lot of effort on the currently proposed
> > approaches before we admit that they're unworkable.
>
> I think the question we should be asking ourselves is not whether WAL
> as it currently exists is adequate for logical replication, but rather
> or not it could be made adequate.

Agreed.

> For example, suppose that we were
> to arrange things so that, after each checkpoint, the first insert,
> update, or delete record for a given relfilenode after each checkpoint
> emits a special WAL record that contains the relation name, schema
> OID, attribute names, and attribute type OIDs.

Not just the first after checkpoint, but also the first after a schema
change, even though will duplicate the wals with changes to system
catalog, it is likely much cheaper overall to always have a fresh
structure in wal stream.

And if we really want to do WAL-->logical-->SQL_text conversion on a
host separate from the master, we also need to insert there the type
definitions of user-defined types together with at least types output
functions in some form .

So you basically need a large part of postgres for reliably making sense
of WAL.

> Well, now we are much
> closer to being able to do some meaningful decoding of the tuple data,
> and it really doesn't cost us that much. Handling DDL (and manual
> system catalog modifications) seems pretty tricky, but I'd be very
> reluctant to give up on it without banging my head against the wall
> pretty hard.

Most straightforward way is to have a more or less full copy of
pg_catalog also on the "WAL-filtering / WAL-conversion" node, and to use
it in 1:1 replicas of transactions recreated from the WAL .
This way we can avoid recreating any alternate views of the masters
schema.

Then again, we could do it all on master and inside the wal-writing
transaction and thus avoid large chunk of the problems.

If the receiving side is also PostgreSQL with same catalog structure
(i.e same major version) then we don't actually need to "handle DDL" in
any complicated way, it would be enough to just carry over the changes
to system tables .

The main reason we don't do it currently for trigger-based logical
replication is the restriction of not being able to have triggers on
system tables.

I hope it is much easier to have the triggerless record generation also
work on system tables.

> The trouble with giving up on WAL completely and moving
> to a separate replication log is that it means a whole lot of
> additional I/O, which is bound to have a negative effect on
> performance.

Why would you give up WAL ?

Or do you mean that the new "logical-wal" needs to have same commit time
behaviour as WAL to be reliable ?

I'd envision a scenario where the logi-wal is sent to slave or
distribution hub directly and not written at the local host at all.
An optionally sync mode similar to current sync WAL replication could be
configured. I hope this would run mostly in parallel with local WAL
generation so not much extra wall-clock time would be wasted.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-04-29 22:20:53 Re: Future In-Core Replication
Previous Message Tom Lane 2012-04-29 21:27:56 Re: smart shutdown at end of transaction (was: Default mode for shutdown)