Re: Error with index on unlogged table

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Error with index on unlogged table
Date: 2015-12-04 08:10:19
Message-ID: 20151204081019.GC28762@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-12-04 17:00:13 +0900, Michael Paquier wrote:
> Andres Freud wrote:
> >> @@ -450,14 +450,21 @@ ginbuildempty(PG_FUNCTION_ARGS)
> >> START_CRIT_SECTION();
> >> GinInitMetabuffer(MetaBuffer);
> >> MarkBufferDirty(MetaBuffer);
> >> - log_newpage_buffer(MetaBuffer, false);
> >> + log_newpage_buffer(MetaBuffer, false, false);
> >> GinInitBuffer(RootBuffer, GIN_LEAF);
> >> MarkBufferDirty(RootBuffer);
> >> - log_newpage_buffer(RootBuffer, false);
> >> + log_newpage_buffer(RootBuffer, false, true);
> >> END_CRIT_SECTION();
> >>
> > Why isn't the metapage flushed to disk?
>
> I was not sure if we should only flush only at the last page, as pages
> just before would be already replayed. Switched.

Uh, that's not how it works! The earlier pages would just be in shared
buffers. Neither smgrwrite, nor smgrimmedsync does anything about that!

> >> extern void InitXLogInsert(void);
> >> diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
> >> index ad1eb4b..91445f1 100644
> >> --- a/src/include/catalog/pg_control.h
> >> +++ b/src/include/catalog/pg_control.h
> >> @@ -73,6 +73,7 @@ typedef struct CheckPoint
> >> #define XLOG_END_OF_RECOVERY 0x90
> >> #define XLOG_FPI_FOR_HINT 0xA0
> >> #define XLOG_FPI 0xB0
> >> +#define XLOG_FPI_FOR_SYNC 0xC0
> >
> >
> > I'm not a big fan of the XLOG_FPI_FOR_SYNC name. Syncing is a bit too
> > ambigous for my taste. How about either naming it XLOG_FPI_FLUSH or
> > instead adding actual record data and a 'flags' field in there? I
> > slightly prefer the latter - XLOG_FPI and XLOG_FPI_FOR_HINT really are
> > different, XLOG_FPI_FOR_SYNC not so much.
>
> Let's go for XLOG_FPI_FLUSH.

I think the other way is a bit better, because we can add new flags
without changing the WAL format.

> diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
> index 99337b0..b646101 100644
> --- a/src/backend/access/brin/brin.c
> +++ b/src/backend/access/brin/brin.c
> @@ -682,7 +682,15 @@ brinbuildempty(PG_FUNCTION_ARGS)
> brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
> BRIN_CURRENT_VERSION);
> MarkBufferDirty(metabuf);
> - log_newpage_buffer(metabuf, false);
> +
> + /*
> + * For unlogged relations, this page should be immediately flushed
> + * to disk after being replayed. This is necessary to ensure that the
> + * initial on-disk state of unlogged relations is preserved when
> + * they get reset at the end of recovery.
> + */
> + log_newpage_buffer(metabuf, false,
> + index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
> END_CRIT_SECTION();

Maybe write the last sentence as '... as the on disk files are copied
directly at the end of recovery.'?

> @@ -336,7 +336,8 @@ end_heap_rewrite(RewriteState state)
> MAIN_FORKNUM,
> state->rs_blockno,
> state->rs_buffer,
> - true);
> + true,
> + false);
> RelationOpenSmgr(state->rs_new_rel);
>
> PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
> @@ -685,7 +686,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
> MAIN_FORKNUM,
> state->rs_blockno,
> page,
> - true);
> + true,
> + false);

Did you verify that that's ok when a unlogged table is clustered/vacuum
full'ed?

> @@ -181,6 +183,9 @@ xlog_identify(uint8 info)
> case XLOG_FPI_FOR_HINT:
> id = "FPI_FOR_HINT";
> break;
> + case XLOG_FPI_FLUSH:
> + id = "FPI_FOR_SYNC";
> + break;
> }

Old string.

> --- a/src/backend/access/transam/xloginsert.c
> +++ b/src/backend/access/transam/xloginsert.c
> @@ -932,10 +932,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
> * If the page follows the standard page layout, with a PageHeader and unused
> * space between pd_lower and pd_upper, set 'page_std' to TRUE. That allows
> * the unused space to be left out from the WAL record, making it smaller.
> + *
> + * If 'is_flush' is set to TRUE, relation will be requested to flush
> + * immediately its data at replay after replaying this full page image.
> */

s/is_flush/flush_immed/? And maybe say that it 'will be flushed to the
OS immediately after replaying the record'?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-12-04 08:37:54 Re: psql: add \pset true/false
Previous Message Amit Kapila 2015-12-04 08:07:25 Re: parallel joins, and better parallel explain