Re: Subtle bug in "Simplify tape block format" commit

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Subtle bug in "Simplify tape block format" commit
Date: 2017-01-30 23:51:36
Message-ID: CAH2-Wz=EM1it3TYAOdxLHiZ_Aiv21_7wOTOBdzBnU-yp8mhQiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 30, 2017 at 10:01 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Let me take another look at this later today before proceeding. I want
> to run it against a custom test suite I've developed.

I've done so. Some more thoughts:

* I don't think that this is really any less efficient than my patch.
It's just that the write of a block happens within ltsWriteBlock()
automatically, rather than being an explicit thing that comes at the
tail-end of a run. This is more general, but otherwise very similar.
The downside is that it will work when something that we might want to
break happens, but I do concede that that's worth it.

* If you're writing out any old bit pattern, I suggest using 0x7F
bytes rather than nul bytes (I do agree that it does seem worth making
this some particular bit pattern, so as to not have to bother with
Valgrind suppressions and so on). That pattern immediately gives you a
hint on where to look for more information when there is a problem. To
me, it suggests "this is something weird". We don't want this to
happen very often.

* I think that the name lts->nBlocksAllocated works better than
lts->nBlocksReserved. After all, you are still writing stuff out in a
way that respects BufFile limitations around holes, and still
reporting that to the user via trace_sort as the number of blocks
actually written.

* I think that you should put the new code into a new function, called
ltsAllocBlocksUntil(), or similar -- this can do the BufFileWrite()
stuff itself, with a suitable custom defensive elog(ERROR) message.
That way, the only new branch needed in ltsWriteBlock() is "if
(blocknum > lts->nBlocksWritten)" (maybe use unlikely() too?), and you
can make it clear that ltsAllocBlocksUntil() is a rarely needed thing,
which seems appropriate.

We really don't want ltsAllocBlocksUntil() logic to be called very
often, and certainly not to write more than 1 or 2 blocks at a time
(no more than 1 with current usage). After all, that would mean
writing to the same position twice or more, for no reason at all.
Maybe note in comments that it's only called at the end of a run in
practice, or something to that effect. Keeping writes sequential is
very important, to keep logtape block reclamation effective.

I was actually planning on introducing a distinction between blocks
allocated and blocks written anyway, for the benefit of logtape.c
parallel sort, where holes can be left behind post-unification (the
start of each workers space needs to be aligned to
MAX_PHYSICAL_FILESIZE, so multi-block ranges of holes will be common).
So, your approach makes sense to me, and I now slightly prefer your
patch to my original.

Thanks
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-01-30 23:54:50 Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)
Previous Message Andres Freund 2017-01-30 23:47:49 Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)