Quick Links

Re: Parallel tuplesort (for parallel B-Tree index creation)

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject:	Re: Parallel tuplesort (for parallel B-Tree index creation)
Date:	2016-09-09 22:59:58
Message-ID:	CAM3SWZQpbS1ytO77fW81QmZoZ8nFRRTGLJOb8x7TifobDsfs7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Sep 7, 2016 at 2:36 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> 3. If we do that, we'll still have to reserve the tape buffers for all the
> tapes that we use during merge. So after we've built the initial runs, we'll
> need to reserve memory for those buffers. That might require shrinking
> memtuples. But that's OK: after building the initial runs, memtuples is
> empty, so we can shrink it.

Do you really think all this is worth the effort? Given how things are
going to improve for merging anyway, I tend to doubt it. I'd rather
just apply the cap (not necessarily 501 tapes, but something), and be
done with it. As you know, Knuth never advocated more than 7 tapes at
once, which I don't think had anything to do with the economics of
tape drives in the 1970s (or problems with tape operators getting
repetitive strange injuries). There is a chart in volume 3 about this.
Senior hackers talked about a cap like this from day one, back in
2006, when Simon and Tom initially worked on scaling the number of
tapes. Alternatively, we could make MERGE_BUFFER_SIZE much larger,
which I think would be a good idea independent of whatever waste
logically allocation of never-used tapes presents us with. It's
currently 1/4 of 1MiB, which is hardly anything these days, and
doesn't seem to have much to do with OS read ahead trigger sizes.

If we were going to do something like you describe here, I'd prefer it
to be driven by an observable benefit in performance, rather than a
theoretical benefit. Not doing everything in one pass isn't
necessarily worse than having a less cache efficient heap -- it might
be quite a bit better, in fact. You've seen how hard it can be to get
a sort that is I/O bound. (Sorting will tend to not be completely I/O
bound, unless perhaps parallelism is used).

Anyway, this patch (patch 0001-*) is by far the least important of the
3 that you and Claudio are signed up to review. I don't think it's
worth bending over backwards to do better. If you're not comfortable
with a simple cap like this, than I'd suggest that we leave it at
that, since our time is better spent elsewhere. We can just shelve it
for now -- "returned with feedback". I wouldn't make any noise about
it (although, I actually don't think that the cap idea is at all
controversial).

--
Peter Geoghegan

In response to

Re: Parallel tuplesort (for parallel B-Tree index creation) at 2016-09-07 21:36:18 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Langley, Scott E	2016-09-09 23:03:51	Re: zero knowledge users
Previous Message	Kevin Grittner	2016-09-09 22:28:58	Re: delta relations in AFTER triggers