Re: Abbreviated keys for text cost model fix

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Abbreviated keys for text cost model fix
Date: 2015-02-23 19:17:05
Message-ID: 54EB7CB1.8000308@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.2.2015 19:22, Peter Geoghegan wrote:
> On Mon, Feb 23, 2015 at 8:40 AM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> The durations are much higher than without the single unsorted row added
>> at the end. Queries often take 20x longer to finish (on the same code),
>> depending on the scale.
>
> I knew this would happen. :-)
>
>> What's interesting here is that some queries are much faster, but query
>> #3 is slow until we hit 2M rows:
>>
>> select * from (select * from stuff_int_desc order by randint
>> offset 100000000000) foo
>
> Not sure why that would be. Can you see what a "#define
> DEBUG_ABBREV_KEYS" (within varlena.c) build shows happens with the
> cost model? Enable debug1 output for that.

test=# select * from (select * from stuff_text_asc order by randtxt
offset 100000000000) foo;
DEBUG: abbrev_distinct after 160: 152.862464 (key_distinct: 155.187096,
norm_abbrev_card: 0.955390, prop_card: 0.200000)
DEBUG: abbrev_distinct after 320: 314.784336 (key_distinct: 313.425344,
norm_abbrev_card: 0.983701, prop_card: 0.200000)
DEBUG: abbrev_distinct after 640: 629.050490 (key_distinct: 643.945298,
norm_abbrev_card: 0.982891, prop_card: 0.200000)
DEBUG: abbrev_distinct after 1280: 1194.271440 (key_distinct:
1257.153875, norm_abbrev_card: 0.933025, prop_card: 0.200000)
DEBUG: abbrev_distinct after 2560: 2402.820431 (key_distinct:
2631.772790, norm_abbrev_card: 0.938602, prop_card: 0.200000)
DEBUG: abbrev_distinct after 5120: 4490.142750 (key_distinct:
5261.865309, norm_abbrev_card: 0.876981, prop_card: 0.200000)
DEBUG: abbrev_distinct after 10240: 8000.884171 (key_distinct:
10123.941172, norm_abbrev_card: 0.781336, prop_card: 0.200000)
DEBUG: abbrev_distinct after 20480: 13596.546899 (key_distinct:
20563.364696, norm_abbrev_card: 0.663894, prop_card: 0.130000)
DEBUG: abbrev_distinct after 40960: 23369.899021 (key_distinct:
40500.600680, norm_abbrev_card: 0.570554, prop_card: 0.084500)
DEBUG: abbrev_distinct after 81920: 30884.544030 (key_distinct:
81466.848436, norm_abbrev_card: 0.377009, prop_card: 0.054925)
randtxt
---------
(0 rows)

>> Looking at the previous tests, I see this is exactly what's
>> happening to this query with 'random' dataset - it's slightly
>> slower than master up until 2M rows, when it suddenly jumps to the
>> same speedup as the other queries. Can we do something about that?
>
> Possibly.
>
>> Anyway, I'm wondering what conclusion we can do from this? I believe
>> vast majority of datasets in production won't be perfectly sorted,
>> because when the table is CLUSTERed by index we tend to use index scan
>> to do the sort (so no problem), or the data are not actually perfectly
>> sorted (and here we get significant speedup).
>
> One conclusion might be that commit a3f0b3d6 should not have added the
> pre-check for perfectly sorted input, which is not in the Bentley &
> Mcilroy original - exploiting the fact that the set is already
> partially sorted is a fine thing, but we're not doing the necessary

Would that pre-check hurt even the random test case? I can imagine it
might behave strangely with almost perfectly sorted data (when only the
very last row is unsorted), but not for random data (but see below).

> bookkeeping. But that's not really why I suggested that test case.
> Rather, I wanted to put the possible regression in the event of
> perfectly sorted input in perspective. Maybe that regression isn't
> worth worrying about, if in the event of a single tuple being out of
> order at the end of the set the outcome dramatically shifts to
> strongly favor abbreviation. Another way of looking at it would be
> that the pre-sorted case is an argument against strxfrm() sorting in
> general more than abbreviated keys in particular, an argument that
> seems new and strange to me. The analysis of algorithms focuses on the
> worst case and average case for a reason, and (say) the glibc
> documentation never considers this when discussing strxfrm().

Yeah, every algorithm has a worst case - either you get a very fast
execution on average with a few rare cases when it's much slower, or you
get very bad average performance.

But after looking at the data a bit more, I actually think this is a red
herring. After looking at the actual runtimes (and not just speedups), I
get this:

scale query# master cost-model speedup
100000 1 884 103 856%
1000000 1 12153 1506 807%
2000000 1 26187 3894 673%
3000000 1 38147 6601 578%
4000000 1 56332 10976 513%
5000000 1 77009 16409 469%

100000 2 883 110 805%
1000000 2 12114 1574 770%
2000000 2 26104 4038 646%
3000000 2 38028 6828 557%
4000000 2 56138 11294 497%
5000000 2 76720 16829 456%

100000 3 102 106 97%
1000000 3 1507 1537 98%
2000000 3 26984 3977 678%
3000000 3 39090 6753 579%
4000000 3 57239 11228 510%
5000000 3 78150 16769 466%

So while it's true that for the 3rd query we get much worse results
compared to the other queries (i.e. we don't get >400% speedup but ~3%
slowdown compared to master), it's true that master performs
exceptionally well for this query with small datasets. Once we get to 2M
rows, the master performance drops significantly but cost-model keeps
the performance characteristics and the speedup jumps back to ~700%
which is nice.

These numbers are for the 'ASC + unsorted row' test, but I do get
exactly the same pattern for the 'random' tests done previously.

It would be nice if we could address the 3% regression for the last
query, but I guess it's not a big deal. The numbers in general are
absolutely impressive. Kudos.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-02-23 19:43:17 Precedence of NOT LIKE, NOT BETWEEN, etc
Previous Message Alvaro Herrera 2015-02-23 19:01:35 OBJECT_ATTRIBUTE is useless (or: ALTER TYPE vs ALTER TABLE for composites)