Re: Abbreviated keys for Numeric

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Abbreviated keys for Numeric
Date: 2015-02-21 16:13:01
Message-ID: 54E8AE8D.6030006@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Gavin,

On 21.2.2015 06:35, Gavin Flower wrote:
> On 21/02/15 18:18, Tomas Vondra wrote:
>>
>> OK, so I've repeated the benchmarks with both patches applied, and I
>> think the results are interesting. I extended the benchmark a bit - see
>> the SQL script attached.
>>
>> 1) multiple queries
>>
>> select percentile_disc(0) within group (order by val) from stuff
>>
>> select count(distinct val) from stuff
>>
>> select * from
>> (select * from stuff order by val offset 100000000000) foo
>>
>> 2) multiple data types - int, float, text and numeric
>>
>> 3) multiple scales - 1M, 2M, 3M, 4M and 5M rows
>>
>> Each query was executed 10x, the timings were averaged. I do know some
>> of the data types don't benefit from the patches, but I included them to
>> get a sense of how noisy the results are.
>>
>> I did the measurements for
>>
>> 1) master
>> 2) master + datum_sort_abbrev.patch
>> 3) master + datum_sort_abbrev.patch + numeric_sortsup.patch
>>
>> and then computed the speedup for each type/scale combination (the
>> impact on all the queries is almost exactly the same).
>>
>> Complete results are available here: http://bit.ly/1EA4mR9
>>
>> I'll post all the summary here, although some of the numbers are about
>> the other abbreviated keys patch.
>>
>>
>> 1) datum_sort_abbrev.patch vs. master
>>
>> scale float int numeric text
>> ---------------------------------------------
>> 1 101% 99% 105% 404%
>> 2 101% 98% 96% 98%
>> 3 101% 101% 99% 97%
>> 4 100% 101% 98% 95%
>> 5 99% 98% 93% 95%
>>
>> 2) numeric_sortsup.patch vs. master
>>
>> scale float int numeric text
>> ---------------------------------------------
>> 1 97% 98% 374% 396%
>> 2 100% 101% 407% 96%
>> 3 99% 102% 407% 95%
>> 4 99% 101% 423% 92%
>> 5 95% 99% 411% 92%
>>
>>
>> I think the gains are pretty awesome - I mean, 400% speedup for Numeric
>> accross the board? Yes please!
>>
>> The gains for text are also very nice, although in this case that only
>> happens for the smallest scale (1M rows), and for larger scales it's
>> actually slower than current master :-(
>>
>> It's not just rainbows and unicorns, though. With both patches applied,
>> text sorts get even slower (up to ~8% slower than master), It also seems
>> to impact float (which gets ~5% slower, for some reason), but I don't
>> see how that could happen ... but I suspect this might be noise.
>>
>> I'll repeat the tests on another machine after the weekend, and post an
>> update whether the results are the same or significantly different.
>>
>> regards
>>
>
> What are the standard deviations?

I have checked that the results are consistent before sending the
results to the list, but I didn't want tu dump a huge pile of
stddev/min/max/... numbers into that e-mail. So I just mentioned that
the results are available in a spreadsheet (http://bit.ly/1EA4mR9).

See for example the "details" sheet with all the numbers aggregated.
That's where the numbers for the answers below come from. I'll provide
references to the columns.

For all three cases (master, datum, datum+numeric) STDDEV numbers are
~1-2% of the average. See columns T-V in the table.

>
> Do the arithmetic means change much if you exclude the 2 fastest & 2
> slowest?

No. The change (compared to plain average) is ~1-2% of the value. See
columns W-Y in the table.

>
> How do the arithmetic means compare to their respective medians?

I have not included medians into the current table (will do for the next
run), but all the other metrics seem quite consistent.

> Essentially, how consistent are the results, or how great is the
> noise? There may be better indicators than the ones I've suggested
> above.

I believe the results are very consistent (you may check the raw data in
the spreadsheet), but let me check after repeating this. I'll also run
the same test suite on another machine (I wonder how a different CPU
will perform).

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-02-21 16:30:24 Re: deparsing utility commands
Previous Message Andrew Dunstan 2015-02-21 14:59:36 Re: Bootstrap DATA is a pita