| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> | 
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com> | 
| Subject: | Tuplesort merge pre-reading | 
| Date: | 2016-09-06 12:20:30 | 
| Message-ID: | f298f77a-cf06-e70c-d5a4-a20b472b4627@iki.fi | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
While reviewing Peter's latest round of sorting patches, and trying to 
understand the new "batch allocation" mechanism, I started to wonder how 
useful the pre-reading in the merge stage is in the first place.
I'm talking about the code that reads a bunch of from each tape, loading 
them into the memtuples array. That code was added by Tom Lane, back in 
1999:
commit cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date:   Sat Oct 30 17:27:15 1999 +0000
     Further performance improvements in sorting: reduce number of 
comparisons
     during initial run formation by keeping both current run and next-run
     tuples in the same heap (yup, Knuth is smarter than I am).  And, during
     merge passes, make use of available sort memory to load multiple tuples
     from any one input 'tape' at a time, thereby improving locality of
     access to the temp file.
So apparently there was a benefit back then, but is it still worthwhile? 
The LogicalTape buffers one block at a time, anyway, how much gain are 
we getting from parsing the tuples into SortTuple format in batches?
I wrote a quick patch to test that, attached. It seems to improve 
performance, at least in this small test case:
create table lotsofints(i integer);
insert into lotsofints select random() * 1000000000.0 from 
generate_series(1, 10000000);
vacuum freeze;
select count(*) FROM (select * from lotsofints order by i) t;
On my laptop, with default work_mem=4MB, that select takes 7.8 s on 
unpatched master, and 6.2 s with the attached patch.
So, at least in some cases, the pre-loading hurts. I think we should get 
rid of it. This patch probably needs some polishing: I replaced the 
batch allocations with a simpler scheme with a buffer to hold just a 
single tuple for each tape, and that might need some more work to allow 
downsizing those buffers if you have a few very large tuples in an 
otherwise narrow table. And perhaps we should free and reallocate a 
smaller memtuples array for the merging, now that we're not making use 
of the whole of it. And perhaps we should teach LogicalTape to use 
larger buffers, if we can't rely on the OS to do the buffering for us. 
But overall, this seems to make the code both simpler and faster.
Am I missing something?
- Heikki
| Attachment | Content-Type | Size | 
|---|---|---|
| 0001-Don-t-bother-to-pre-read-tuples-into-slots-during-me.patch | application/x-patch | 24.3 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Christian Ullrich | 2016-09-06 12:21:00 | Re: pgsql: Add putenv support for msvcrt from Visual Studio 2013 | 
| Previous Message | Robert Haas | 2016-09-06 12:19:28 | Re: Declarative partitioning - another take |