Tuplesort merge pre-reading

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>
Subject: Tuplesort merge pre-reading
Date: 2016-09-06 12:20:30
Message-ID: f298f77a-cf06-e70c-d5a4-a20b472b4627@iki.fi
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

While reviewing Peter's latest round of sorting patches, and trying to
understand the new "batch allocation" mechanism, I started to wonder how
useful the pre-reading in the merge stage is in the first place.

I'm talking about the code that reads a bunch of from each tape, loading
them into the memtuples array. That code was added by Tom Lane, back in

commit cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Sat Oct 30 17:27:15 1999 +0000

Further performance improvements in sorting: reduce number of
during initial run formation by keeping both current run and next-run
tuples in the same heap (yup, Knuth is smarter than I am). And, during
merge passes, make use of available sort memory to load multiple tuples
from any one input 'tape' at a time, thereby improving locality of
access to the temp file.

So apparently there was a benefit back then, but is it still worthwhile?
The LogicalTape buffers one block at a time, anyway, how much gain are
we getting from parsing the tuples into SortTuple format in batches?

I wrote a quick patch to test that, attached. It seems to improve
performance, at least in this small test case:

create table lotsofints(i integer);
insert into lotsofints select random() * 1000000000.0 from
generate_series(1, 10000000);
vacuum freeze;

select count(*) FROM (select * from lotsofints order by i) t;

On my laptop, with default work_mem=4MB, that select takes 7.8 s on
unpatched master, and 6.2 s with the attached patch.

So, at least in some cases, the pre-loading hurts. I think we should get
rid of it. This patch probably needs some polishing: I replaced the
batch allocations with a simpler scheme with a buffer to hold just a
single tuple for each tape, and that might need some more work to allow
downsizing those buffers if you have a few very large tuples in an
otherwise narrow table. And perhaps we should free and reallocate a
smaller memtuples array for the merging, now that we're not making use
of the whole of it. And perhaps we should teach LogicalTape to use
larger buffers, if we can't rely on the OS to do the buffering for us.
But overall, this seems to make the code both simpler and faster.

Am I missing something?

- Heikki

Attachment Content-Type Size
0001-Don-t-bother-to-pre-read-tuples-into-slots-during-me.patch application/x-patch 24.3 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Christian Ullrich 2016-09-06 12:21:00 Re: pgsql: Add putenv support for msvcrt from Visual Studio 2013
Previous Message Robert Haas 2016-09-06 12:19:28 Re: Declarative partitioning - another take