Re: Progress on fast path sorting, btree index creation time

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, "Jim Decibel! Nasby" <decibel(at)decibel(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress on fast path sorting, btree index creation time
Date: 2012-02-08 17:58:51
Message-ID: CA+TgmobFz8SqTGTaaRYEryWUZFODGETprbuYgBhsPLR4+Li6TQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 8, 2012 at 10:59 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Wed, Feb 08, 2012 at 10:17:36AM -0500, Robert Haas wrote:
>> On Wed, Feb 8, 2012 at 9:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > IMO this patch is already well past the point of diminishing returns in
>> > value-per-byte-added.  I'd like to see it trimmed back to provide a fast
>> > path for just single-column int4/int8/float4/float8 sorts.  The other
>> > cases aren't going to offer enough of a win to justify the code space.
>>
>> I'm curious about how much we're gaining from the single-column
>> specializations vs. the type-specific specializations.  I think I'm
>> going to go try to characterize that.
>
> Yes, please.  That would be a big help.   Is there no optimization for
> strings?  I assume they are sorted a lot.

They are, but the value of the optimization is expected to diminish
for more complex comparators.

I did a quick test of this using the same setup I mentioned upthread:

create table nodups (g) as select (g%10)*1000+g/10 from
generate_series(1,10000) g;

This was the actual test query:

select * from nodups order by g offset 10001;

I did a ten-second warmup on after starting each postmaster, followed
by three one-minute test runs:

master:
tps = 298.824321 (including connections establishing)
tps = 301.741619 (including connections establishing)
tps = 302.238016 (including connections establishing)

Peter's patch, but with the type-specific optimizations disabled, so
using just the single-sortkey optimizations:
tps = 357.553690 (including connections establishing)
tps = 358.765875 (including connections establishing)
tps = 358.825051 (including connections establishing)

Peter's patch, intact:
tps = 394.566994 (including connections establishing)
tps = 394.228667 (including connections establishing)
tps = 394.492677 (including connections establishing)

So that's about a 19% speedup for just the single sort-key
optimizations, and about a 30% speedup in total. Compared to a
baseline that includes the single sort-key optimizations, the
additional type-specific optimization is buying about 10% on this
test.

I tried changing the test query to return the data, like this:

select * from nodups order by g;

master:
tps = 181.301129 (including connections establishing)
tps = 180.348333 (excluding connections establishing)
tps = 179.600825 (including connections establishing)

Peter's patch, but with the type-specific optimizations disabled, so
using just the single-sortkey optimizations:
tps = 201.728269 (including connections establishing)
tps = 198.022527 (including connections establishing)
tps = 199.663082 (including connections establishing)

Peter's patch, intact:
tps = 214.037387 (including connections establishing)
tps = 212.895145 (including connections establishing)
tps = 211.596558 (including connections establishing)

So, with the overhead of actually returning the sorted data, we get
10% from the generic single-sortkey optimizations and 18% from the two
combined. Compared to a baseline that includes the single-sortkey
optimizations, the incremental improvement from adding the
type-specific optimizations is about 6.6%. It would be less on a
wider table, presumably.

I also tested a two-column sort:

create table twocol (a, b) as select g%101, g%17 from
generate_series(1,10000) g;
select * from twocol order by a, b offset 10000;

master:
tps = 270.218604 (including connections establishing)
tps = 265.634964 (including connections establishing)
tps = 268.520805 (including connections establishing)

Peter's patch, intact:
tps = 285.528626 (including connections establishing)
tps = 285.140345 (including connections establishing)
tps = 282.388457 (including connections establishing)

So here the type-specific optimizations are buying us even less: only
about 6%, and that's with no client overhead.

And I tested float8:

create table f8 (g) as select random() from generate_series(1,10000);
select * from f8 order by g offset 10000;

master:
tps = 228.373239 (including connections establishing)
tps = 225.117704 (including connections establishing)
tps = 225.196197 (including connections establishing)

Peter's patch, but with the type-specific optimizations disabled, so
using just the single-sortkey optimizations:
tps = 263.060820 (including connections establishing)
tps = 262.788926 (including connections establishing)
tps = 263.041186 (including connections establishing)

Peter's patch, intact:
tps = 283.274797 (including connections establishing)
tps = 280.597965 (including connections establishing)
tps = 280.151349 (including connections establishing)

That's +17% for the single-sortkey stuff, +25% for everything, and the
incremental improvement of the type-specific optimizations over the
generic single-key optimizations is about 6.7%.

It seems clear that the single sort-key optimizations are a much
better value per byte of code than the type-specific optimizations.
Ignoring client overhead, we get between half and two-thirds of the
benefit, and it costs us just one extra copy of the quicksort logic
for all data types. The type-specific optimizations only apply to a
subset of the available data types, produce a lot more machine code
(one extra copy of quicksort per type), and the absolute performance
benefit is less. My gut feeling is that the few extra percentage
points we can get by including lots of extra copies of quicksort is
not worth it. There are probably many cases where we can squeeze a
few extra points out of inlining (Tom and I have discussed a few of
them in the past, in particular one related to index-only scans) and I
don't have any confidence that this is the most important one, or even
in the top ten. If we were talking about speeding up COPY or index
lookups, I might well feel differently, but very few people are going
to quicksort 10,000 rows on a sufficiently regular basis to justify
doing this. To even measure the benefit of this, you have to set up a
totally artificial test case (such as the above), and you have to run
it via pgbench, because if you just run it interactively, the
run-to-run variation swamps the actual improvement, and without doing
a statistical analysis of the results you can't even be sure whether
anything's changed. I just can't get excited about that. However, I
find the single-key optimizations much more compelling, for the
reasons stated above, and feel we ought to include those.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-02-08 18:02:16 Re: Vacuum rate limit in KBps
Previous Message Tom Lane 2012-02-08 17:53:54 Re: Text-any concatenation volatility acting as optimization barrier