Re: Status of the table access method work

From: Andres Freund <andres(at)anarazel(dot)de>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Status of the table access method work
Date: 2019-04-09 12:32:20
Message-ID: 20190409123220.26cqpntectrbo7df@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-04-09 11:17:29 +0200, Dmitry Dolgov wrote:
> I'm also curious about that. As far as I can see the main objection against
> that was that in this case the recovery process will depend on an extension,
> which could violate reliability.

I don't think that's a primary concern - although it is one. The mapping
from types of records to the handler function needs to be accessible at
a very early state, when the cluster isn't yet in a consistent state. So
we can't just go an look into pg_am, and look up a handler function, etc
- crash recovery happens much earlier than that is possible. Nor do we
want the mapping of 'rmgr id' -> 'extension' to be defined in the config
file, that's way too likely to be wrong. So there needs to be a
different type of mapping, accessible outside the catalog. I supect we'd
have to end up with something very roughly like the relmapper
infrastructure. A tertiary problem is then how to identify extensions
in that mapping - although I suspect just using any library name that
can be passed to load_library() will be OK.

> But I wonder if this argument is still valid for AM's, since the whole
> data is kind of depends on it, not only the recovery.

I don't buy that argument. If you have an AM that registers, using a new
facility, replay routines, and then it errors out / crashes during
those, there's no way to get the cluster back into a consistent
state. So it's not just the one table in that AM that's gone, it's the
entire cluster that's impacted.

> Btw, can someone elaborate, why exactly generic_xlog is not efficient enough?
> I've went through the corresponding thread, looks like generic WAL records are
> bigger than normal one - is it the only reason?

That's one big reason. But also, you just can't do much more than "write
this block into that file" during recovery with. A lot of our replay
routines intentionally do more complicated tasks.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2019-04-09 12:43:57 Re: COLLATE: Hash partition vs UPDATE
Previous Message Christoph Berg 2019-04-09 12:01:12 Re: PGCOLOR? (Re: pgsql: Unified logging system for command-line programs)