Re: Error with index on unlogged table

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Error with index on unlogged table
Date: 2015-12-10 16:32:30
Message-ID: 20151210163230.GA11331@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael, Robert,

On 2015-12-10 21:10:57 +0900, Michael Paquier wrote:
> On Thu, Dec 10, 2015 at 9:07 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> >>> Note that in both cases the patches are not complete, we need to fix
> >>> as well copy_relation_data(at)tablecmds(dot)c so as the INIT_FORKNUM pages
> >>> are logged all the time.
> >>
> >> Agreed. It's probably better treated as an entirely different - pretty
> >> ugly - bug though. I mean it's not some issue of a race during replay,
> >> it's entirely missing WAL logging for SET TABLESPACE of unlogged
> >> relations.
>
> For the second issue, I would just need to extract the fix from one of
> the patches upthread.

I've, pushed the fix for the promotion related issue. I'm afraid that
the ALTER TABLE <unlogged-table> SET TABLESPACE issue is a bit bigger
than it though.

Robert, to catch you up: The isssue, discovered by Michael somewhere in
this thread, is that copy_relation_data(), used by SET TABLESPACE, does
not deal with unlogged tables.

Michael, what your earlier patch did was basically

@@ -9892,9 +9892,14 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,

/*
* We need to log the copied data in WAL iff WAL archiving/streaming is
- * enabled AND it's a permanent relation.
- */
- use_wal = XLogIsNeeded() && relpersistence == RELPERSISTENCE_PERMANENT;
+ * enabled AND it's a permanent relation. Unlogged relations need to have
+ * their init fork logged as well, to ensure a consistent on-disk state
+ * when reset at the end of recovery.
+ */
+ use_wal = XLogIsNeeded() &&
+ (relpersistence == RELPERSISTENCE_PERMANENT ||
+ (relpersistence == RELPERSISTENCE_UNLOGGED &&
+ forkNum == INIT_FORKNUM));

...

nblocks = smgrnblocks(src, forkNum);

for (blkno = 0; blkno < nblocks; blkno++)
{
...
if (use_wal)
log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page);

But that's not sufficient for a bunch of reasons:

First, and that's the big one, that approach doesn't guarantee that the
file will be created in the new tablespace if the file does not have any
blocks. Like e.g. the heap relation itself. This will lead to errors
like:
ERROR: 58P01: could not open file "pg_tblspc/16384/PG_9.6_201511071/12384/16438": No such file or
LOCATION: mdopen, md.c:602

The real problem there imo isn't that the copy_relation_data() doesn't
deal with 0 block tables, but that ATExecSetTableSpace() doesn't have a
unlogged table specific codepath like heap_create_with_catalog() has.

A second problem is that the smgrimmedsync() in copy_relation_data()
isn't called for the init fork of unlogged relations, even if it needs
to.

As ATExecSetTableSpace() is also used for indices, the problem exists
there as well.

Yuck.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-12-10 16:36:52 Re: mdnblocks() sabotages error checking in _mdfd_getseg()
Previous Message Robert Haas 2015-12-10 16:18:09 Re: Remaining 9.5 open items