From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
Cc: | Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: Status of the table access method work |
Date: | 2019-04-09 12:32:20 |
Message-ID: | 20190409123220.26cqpntectrbo7df@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2019-04-09 11:17:29 +0200, Dmitry Dolgov wrote:
> I'm also curious about that. As far as I can see the main objection against
> that was that in this case the recovery process will depend on an extension,
> which could violate reliability.
I don't think that's a primary concern - although it is one. The mapping
from types of records to the handler function needs to be accessible at
a very early state, when the cluster isn't yet in a consistent state. So
we can't just go an look into pg_am, and look up a handler function, etc
- crash recovery happens much earlier than that is possible. Nor do we
want the mapping of 'rmgr id' -> 'extension' to be defined in the config
file, that's way too likely to be wrong. So there needs to be a
different type of mapping, accessible outside the catalog. I supect we'd
have to end up with something very roughly like the relmapper
infrastructure. A tertiary problem is then how to identify extensions
in that mapping - although I suspect just using any library name that
can be passed to load_library() will be OK.
> But I wonder if this argument is still valid for AM's, since the whole
> data is kind of depends on it, not only the recovery.
I don't buy that argument. If you have an AM that registers, using a new
facility, replay routines, and then it errors out / crashes during
those, there's no way to get the cluster back into a consistent
state. So it's not just the one table in that AM that's gone, it's the
entire cluster that's impacted.
> Btw, can someone elaborate, why exactly generic_xlog is not efficient enough?
> I've went through the corresponding thread, looks like generic WAL records are
> bigger than normal one - is it the only reason?
That's one big reason. But also, you just can't do much more than "write
this block into that file" during recovery with. A lot of our replay
routines intentionally do more complicated tasks.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Jesper Pedersen | 2019-04-09 12:43:57 | Re: COLLATE: Hash partition vs UPDATE |
Previous Message | Christoph Berg | 2019-04-09 12:01:12 | Re: PGCOLOR? (Re: pgsql: Unified logging system for command-line programs) |