Re: Abbreviated keys for Numeric

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Abbreviated keys for Numeric
Date: 2015-02-21 20:07:13
Message-ID: 54E8E571.2030507@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.2.2015 19:57, Peter Geoghegan wrote:
> On Fri, Feb 20, 2015 at 9:18 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> The gains for text are also very nice, although in this case that only
>> happens for the smallest scale (1M rows), and for larger scales it's
>> actually slower than current master :-(
>
> That's odd. I have a hard time thinking of why the datum sort patch
> could be at fault, though. I bet the cost model of the text
> sortsupport routine is somehow hitting a snag on those larger sized
> sets. They should be just as accelerated, and probably more so, than
> your 1M sized set that was sped up 4x here.
>
> Can you see what is output with debugging of text abbreviation turned
> on? Put "#define DEBUG_ABBREV_KEYS" at the top of varlena.c and
> rebuild. Report on the debug1 output, and see if and when abbreviation
> is aborted.
>
> I suspected that the cost model was too conservative (or, more
> lightly, just too simplistic). I ought to revisit my patch to give the
> ad-hoc cost model a sense of proportion about how far along we are,
> which was previously deferred [1]. When there is a strong
> physical/logical correlation, that can be essential.

I don't think there's a correlation - at least not in the usual sense
that the data are stored 'sorted' by the column. The tables were
generated as random.

Hmmm, maybe we should add such correlated cases into the test script, to
see how that behaves with those patches ...

>
> Did you first index the text field, and then run CLUSTER for the
> larger sized sets on that index (to test abbreviation)? That would
> cause there to be a lot of abbreviated keys that seemed to poorly
> capture the entropy of their underlying values, when in fact that was
> entirely down to our only considering the first 10 tuples in a 100
> million tuple set. Having some patience is important there, and a hint
> at how far in we are gives the ad-hoc cost model a much better sense
> of proportion...it then has a sense of how patient it should be.

There are no indexes in the test script.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2015-02-21 20:15:49 Re: Abbreviated keys for Numeric
Previous Message Andres Freund 2015-02-21 20:01:19 Re: Bootstrap DATA is a pita