Re: index problems (again)

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Geoff Winkless <pgsqladmin(at)geoff(dot)dj>
Cc: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: index problems (again)
Date: 2016-03-13 06:40:10
Message-ID: CAMkU=1zjUhF+b=tCscWhJ+fMm=W_sHWTQ_i9LV3ohPX+cbo9SQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Mar 8, 2016 at 2:16 AM, Geoff Winkless <pgsqladmin(at)geoff(dot)dj> wrote:
> On 7 March 2016 at 20:40, Peter J. Holzer <hjp-pgsql(at)hjp(dot)at> wrote:
>> As Tom wrote, the estimate of having to read only about 140 rows is only
>> valid if sc_id and sc_date are uncorrelated. In reality your query has
>> to read a lot more than 140 rows, so it is much slower.
>
> But as I've said previously, even if I do select from scdate values
> that I know to be in the first 1% of the data (supposedly the perfect
> condition) the scan method is insignificantly quicker than the index
> (scdate,scid) method.

That is sure not the case in my hands. If I select from the first 1%,
I get the (scid) index being 10 times faster than (scdate,scid), and
(scid,scdate) being 50 times faster.

> Even with the absolute perfect storm (loading in the entire index for
> the full range) it's still not too bad (1.3 seconds or so).
>
> The point is that to assume, knowing nothing about the data, that the
> data is in an even distribution is only a valid strategy if the worst
> case (when that assumption turns out to be wildly incorrect) is not
> catastrophic. That's not the case here.

That just makes someone else's catastrophic worst case come to the fore.

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2016-03-13 06:46:27 Re: index problems (again)
Previous Message Melvin Davidson 2016-03-13 03:10:20 Re: Distributed Table Partitioning