Re: Should buffer of initialization fork have a BM_PERMANENT flag

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Wang Hao <whberet(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Should buffer of initialization fork have a BM_PERMANENT flag
Date: 2017-01-26 00:14:03
Message-ID: CAB7nPqRfMS13gm2dN7DTfOfnhsXe2LtpJ=RA2xYZhw2yzYmMSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(Adding Robert in CC.)

On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao <whberet(at)gmail(dot)com> wrote:
> An unlogged table has an initialization fork. The initialization fork does
> not have an BM_PERMANENT flag when get a buffer.
> In checkpoint (not shutdown or end of recovery), it will not write to disk.
> after a crash recovery, the page of initialization fork will not correctly,
> then make the main fork not correctly too.

For init forks the flush need absolutely to happen, so that's really
not good. We ought to fix BufferAlloc() appropriately here.

> Here is an example for GIN index.
>
> create unlogged table gin_test_tbl(i int4[]);
> create index gin_test_idx on gin_test_tbl using gin (i);
> checkpoint;
>
> kill all the postgres process, and restart again.
>
> vacuum gin_test_tbl; -- crash.
>
> It seems have same problem in BRIN, GIN, GiST and HASH index which using
> buffer for meta page initialize in ambuildempty function.

Yeah, other index AMs deal directly with the sync of the page, that's
why there is no issue for them.

So the patch attached fixes the problem by changing BufferAlloc() in
such a way that initialization forks are permanently written to disk,
which is what you are suggesting. As a simple fix for back-branches
that's enough, though on HEAD I think that we should really rework the
empty() routines so as the write goes through shared buffers first,
that seems more solid than relying on the sgmr routines to do this
work. Robert, what do you think?
--
Michael

Attachment Content-Type Size
unlogged-flush-fix.patch application/octet-stream 708 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-01-26 00:16:34 Re: safer node casting
Previous Message Peter Geoghegan 2017-01-26 00:14:00 Re: Checksums by default?