Re: BUG #6041: Unlogged table was created bad in slave node

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Emanuel <postgres(dot)arg(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #6041: Unlogged table was created bad in slave node
Date: 2011-06-01 18:28:37
Message-ID: BANLkTim433vF5HWjbJ0FSWm_-xA8DDaGNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, May 26, 2011 at 12:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Excerpts from Euler Taveira de Oliveira's message of jue may 26 12:00:05 -0400 2011:
>>> I think we should emit the real cause in those cases, if possible (not too
>>> much overhead). The message would be "Unlogged table content is not available
>>> in standby server".
>
>> I guess what it should do is create an empty file in the slave.
>
> Probably it should, because won't the table malfunction after the slave
> is promoted to master, if there's no file at all there?  Or will the
> process of coming live create an empty file even if there was none?

Coming live creates an empty file.

> But Euler is pointing out a different issue, which is usability.  If the
> slave just acts like the table is present but empty, we are likely to
> get bug reports about that too.  An error telling you you aren't allowed
> to access such a table on slaves would be more user-friendly, if we can
> do it without too much pain.

I looked into this a bit. A few observations:

(1) This problem is actually not confined to unlogged tables;
temporary tables have the same issue. For example, if you create a
temporary table on the master and then, on the slave, do SELECT * FROM
pg_temp_3.hi_mom (or whatever the name of the temp schema where the
temp table is) you get the same error. In fact I suspect if you took
a base backup that included the temporary relation and matched the
backend ID you could even manage to read out the old contents (modulo
any fun and exciting XID wraparound issues). But the problem is of
course more noticeable for unlogged tables since they're not hidden
away in a special funny schema.

(2) It's somewhat difficult to get a clean fix for this error because
it's coming from deep down in the bowels of the system, inside
mdnblocks(). At that point, we haven't got a Relation available, just
an SMgrRelation, so the information that this relation is unlogged is
not really available, at least not without some sort of gross
modularity violation like calling smgrexists() on the init fork. It
doesn't particularly seem like a good idea to conditionalize the
behavior of mdnblocks() on relpersistence anyway; that's a pretty
gross modularity violation all by itself.

(3) mdnblocks() gets called twice during the process of doing a simple
select on a relation - once from the planner, via estimate_rel_size,
and again from the executor, via initscan. A fairly obvious fix for
this problem is to skip the call to estimate_rel_size() if we're
dealing with a temporary or unlogged relation and are in recovery; and
make heap_beginscan_internal() throw a suitable error under similar
circumstances (or do we want to throw the error at plan time? tossing
it out from all the way down in estimate_rel_size() seems a bit
wacky). Patch taking this approach attached.

(4) It strikes me that it might be possible to address this problem a
bit more cleanly by allowing mdnblocks() and smgrnblocks() and
RelationGetNumberOfBlocksInFork() to take a boolean argument
indicating whether or not an error should be thrown if the underlying
physical file happens not to exist. When no error is to be signaled,
we simply return 0 when the main fork doesn't exist, rather than
throwing an error. We could then make estimate_rel_size() use this
variant, eliminating the need for an explicit test in
get_relation_info(). I'm sort of inclined to go this route even
though it would require touching a bit more code. We've already
whacked around RelationGetNumberOfBlocks() a bit in this release (the
function that formerly had that name is now
RelationGetNumberOfBlocksInFork(), and RelationGetNumberOfBlocks() is
now a macro that calls that function w/MAIN_FORKNUM) so it doesn't
seem like a bad time to get any other API changes that might be useful
out of our system.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
reject-unlogged-during-recovery.patch application/octet-stream 2.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Eisentraut 2011-06-01 18:28:56 Re: [BUGS] BUG #6034: pg_upgrade fails when it should not.
Previous Message Tom Lane 2011-06-01 17:21:03 Re: [BUGS] BUG #6034: pg_upgrade fails when it should not.

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2011-06-01 18:28:56 Re: [BUGS] BUG #6034: pg_upgrade fails when it should not.
Previous Message Steve Crawford 2011-06-01 18:13:34 Re: storing TZ along timestamps