Re: Postgres, fsync, and OSs (specifically linux)

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Asim R P <apraveen(at)pivotal(dot)io>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Postgres, fsync, and OSs (specifically linux)
Date: 2018-11-08 20:03:47
Message-ID: CAEepm=1CMxCyd8QiQhgTZPdyyFUFEFruXbU=omdL05RSCYodWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 9, 2018 at 7:07 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Nov 7, 2018 at 9:41 PM Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > My plan is do a round of testing and review of this stuff next week
> > once the dust is settled on the current minor releases (including
> > fixing a few typos I just spotted and some word-smithing). All going
> > well, I will then push the resulting patches to master and all
> > supported stable branches, unless other reviews or objections appear.
> > At some point not too far down the track I hope to be ready to
> > consider committing that other patch that will completely change all
> > of this code in the master branch, but in any case Craig's patch will
> > get almost a full minor release cycle to sit in the stable branches
> > before release.
>
> I did a read-through of these patches.
>
> + new_requests = entry->requests[forknum];
> + entry->requests[forknum] =
> + bms_join(new_requests, requests);
>
> What happens if bms_join fails, too?

My reasoning for choosing bms_join() is that it cannot fail, assuming
the heap is not corrupted. It simply ORs the two bit-strings into
whichever is the longer input string, and frees the shorter input
string. (In an earlier version I used bms_union(), this function's
non-destructive sibling, but then realised that it could fail to
allocate() causing us to lose track of a 1 bit).

Philosophical point: if pfree() throws, then bms_join() throws, but
(assuming AllocSetFree() implementation) it can only throw if the heap
is corrupted, eg elog(ERROR, "could not find block containing chunk
%p", chunk) and possibly other errors. Of course it's impossible to
make guarantees of any kind in case of arbitrary corruption. But
perhaps we could do this in a critical section, so errors are promoted
to PANIC.

> + recover from the WAL after any failure is reported, preferrably
>
> preferably.

Thanks.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-11-08 20:06:40 Re: Postgres, fsync, and OSs (specifically linux)
Previous Message Alvaro Herrera 2018-11-08 19:39:40 Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation