Re: Abbreviated keys for Numeric

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Abbreviated keys for Numeric
Date: 2015-02-21 05:18:17
Message-ID: 54E81519.308@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 21.2.2015 02:06, Tomas Vondra wrote:
> On 21.2.2015 02:00, Andrew Gierth wrote:
>>>>>>> "Tomas" == Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>>
>> >> Right...so don't test a datum sort case, since that isn't supported
>> >> at all in the master branch. Your test case is invalid for that
>> >> reason.
>>
>> Tomas> What do you mean by 'Datum sort case'?
>>
>> A case where the code path goes via tuplesort_begin_datum rather than
>> tuplesort_begin_heap.
>>
>> Tomas> The test I was using is this:
>>
>> Tomas> select percentile_disc(0) within group (order by randnum) from stuff;
>>
>> Sorting single columns in aggregate calls uses the Datum sort path (in
>> fact I think it's currently the only place that does).
>>
>> Do that test with _both_ the Datum and Numeric sort patches in place,
>> and you will see the effect. With only the Numeric patch, the numeric
>> abbrev code is not called.
>
> D'oh! Thanks for the explanation.

OK, so I've repeated the benchmarks with both patches applied, and I
think the results are interesting. I extended the benchmark a bit - see
the SQL script attached.

1) multiple queries

select percentile_disc(0) within group (order by val) from stuff

select count(distinct val) from stuff

select * from
(select * from stuff order by val offset 100000000000) foo

2) multiple data types - int, float, text and numeric

3) multiple scales - 1M, 2M, 3M, 4M and 5M rows

Each query was executed 10x, the timings were averaged. I do know some
of the data types don't benefit from the patches, but I included them to
get a sense of how noisy the results are.

I did the measurements for

1) master
2) master + datum_sort_abbrev.patch
3) master + datum_sort_abbrev.patch + numeric_sortsup.patch

and then computed the speedup for each type/scale combination (the
impact on all the queries is almost exactly the same).

Complete results are available here: http://bit.ly/1EA4mR9

I'll post all the summary here, although some of the numbers are about
the other abbreviated keys patch.

1) datum_sort_abbrev.patch vs. master

scale float int numeric text
---------------------------------------------
1 101% 99% 105% 404%
2 101% 98% 96% 98%
3 101% 101% 99% 97%
4 100% 101% 98% 95%
5 99% 98% 93% 95%

2) numeric_sortsup.patch vs. master

scale float int numeric text
---------------------------------------------
1 97% 98% 374% 396%
2 100% 101% 407% 96%
3 99% 102% 407% 95%
4 99% 101% 423% 92%
5 95% 99% 411% 92%

I think the gains are pretty awesome - I mean, 400% speedup for Numeric
accross the board? Yes please!

The gains for text are also very nice, although in this case that only
happens for the smallest scale (1M rows), and for larger scales it's
actually slower than current master :-(

It's not just rainbows and unicorns, though. With both patches applied,
text sorts get even slower (up to ~8% slower than master), It also seems
to impact float (which gets ~5% slower, for some reason), but I don't
see how that could happen ... but I suspect this might be noise.

I'll repeat the tests on another machine after the weekend, and post an
update whether the results are the same or significantly different.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
bench.sh application/x-shellscript 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Flower 2015-02-21 05:35:34 Re: Abbreviated keys for Numeric
Previous Message Petr Jelinek 2015-02-21 03:26:56 Re: Bootstrap DATA is a pita