Re: Re: xReader, double-effort (was: Temporary tables under hot standby)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Andres Freund <andres(at)2ndquadrant(dot)com>, Aakash Goel <aakash(dot)bits(at)gmail(dot)com>, Josh Berkus <josh(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
Date: 2012-04-30 03:26:01
Message-ID: CA+Tgmoa1HriwRfbuYxcHCARtTchqk_fW_9a4KfF_4UdmUv6_qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 29, 2012 at 6:00 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> I think the question we should be asking ourselves is not whether WAL
>> as it currently exists is adequate for logical replication, but rather
>> or not it could be made adequate.
>
> Agreed.

And of course I meant "but rather whether or not it could be made
adequate", but I dropped a word.

>> For example, suppose that we were
>> to arrange things so that, after each checkpoint, the first insert,
>> update, or delete record for a given relfilenode after each checkpoint
>> emits a special WAL record that contains the relation name, schema
>> OID, attribute names, and attribute type OIDs.
>
> Not just the first after checkpoint, but also the first after a schema
> change, even though will duplicate the wals with changes to system
> catalog, it is likely much cheaper overall to always have a fresh
> structure in wal stream.

Yes.

> And if we really want to do WAL-->logical-->SQL_text conversion on a
> host separate from the master, we also need to insert there the type
> definitions of user-defined types together with at least types output
> functions in some form .

Yes.

> So you basically need a large part of postgres for reliably making sense
> of WAL.

Agreed, but I think that's a problem we need to fix and not a
tolerable situation at all. If a user can create a type-output
function that goes and looks at the state of the database to determine
what to output, then we are completely screwed, because that basically
means you would need to have a whole Hot Standby instance up and
running just to make it possible to run type output functions. Now
you might be able to build a mechanism around that that is useful to
some people in some situations, but wow does that sound painful. What
I want is for the master to be able to cheaply rattle off the tuples
that got inserted, updated, or deleted as those things happen; needing
a whole second copy of the database just to do that does not meet my
definition of "cheap". Furthermore, it's not really clear that it's
sufficient anyway, since there are problems with what happens before
the HS instance reaches consistency, what happens when it crashes and
restarts, and how do we handle the case when the system catalog we
need to examine to generate the logical replication records is
access-exclusive-locked? Seems like a house of cards.

Some of this might be possible to mitigate contractually, by putting
limits on what type input/output functions are allowed to do. Or we
could invent a new analog of type input/output functions that is
explicitly limited in this way, and support only types that provide
it. But I think the real key is that we can't rely on catalog access:
the WAL stream has to have enough information to allow the reader to
construct some set of in-memory hash tables with sufficient detail to
reliably decode WAL. Or at least that's what I'm thinking.

> Most straightforward way is to have a more or less full copy of
> pg_catalog also on the "WAL-filtering / WAL-conversion" node, and to use
> it in 1:1 replicas of transactions recreated from the WAL .
> This way we can avoid recreating any alternate views of the masters
> schema.

See above; I have serious doubts that this can ever be made to work robustly.

> Then again, we could do it all on master and inside the wal-writing
> transaction and thus avoid large chunk of the problems.
>
> If the receiving side is also PostgreSQL with same catalog structure
> (i.e same major version) then we don't actually need to "handle DDL" in
> any complicated way, it would be enough to just carry over the changes
> to system tables .

I agree it'd be preferable to handle DDL in terms of system catalog
updates, rather than saying, well, this is an ALTER TABLE .. RENAME.
But you need to be able to decode tuples using the right tuple
descriptor, even while that's changing under you.

> Why would you give up WAL ?

For lack of ability to make it work. Don't underestimate how hard
it's going to nail this down.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-04-30 03:29:32 Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
Previous Message Robert Haas 2012-04-30 03:08:42 Re: smart shutdown at end of transaction (was: Default mode for shutdown)