Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3

From: Timothy Garnett <tgarnett(at)panjiva(dot)com>
To: sthomas(at)peak6(dot)com
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3
Date: 2011-03-17 16:55:10
Message-ID: AANLkTinBzY5U69UkfTV6MnkFHVbeUp=iccYEVYg8hOJ0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

The bl_number is nearly a unique value per a row (some small portion are
duplicated on a handful or rows).

We need the unique on pair of bl_number and month, but evaluating current
usage we don't make use of selecting on just month currently (though we
expect to have usage scenarios that do that in the not too distant future,
i.e. pulling out all the records that match a given month date). But for
the time being we've gone with the suggestion here of flipping the order of
the index columns to (bl_number, month) which rescues the original
performance (since the new index can no longer be used with the query).

We'd still be interested in other suggestions for convincing the query
planner not to pick the bad plan in this case (since we'll eventually need
an index on month) without having to use the slower CTE form. To me the
problem seems two fold,
(1) planner doesn't know there's a correlation between month and particular
buyer_ids (some are randomly distributed across month)
(2) even in cases where there isn't a correlation (not all of our buyer
id's are correlated with month) it still seems really surprising to me the
planner thought this plan would be faster, the estimated selectivity of the
buyer fields is 48k / 45million ~ 1/1000 so for limit 100 it should expect
to backward index scan ~100K rows, vs. looking up the expected 48k rows and
doing a top-100 sort on them, I'd expect the latter plan to be faster in
almost all situations (unless we're clustered on month perhaps, but we're
actually clustered on supplier_id, buyer_id which would favor the latter
plan as well I'd think).

(an aside) there's also likely some benefit from clustering in the original
plan before the new index, since we cluster on supplier_id, buyer_id and a
given buyer_id while having up to 100k rows will generally only have a few
supplier ids

Tim

On Wed, Mar 16, 2011 at 1:05 PM, Shaun Thomas <sthomas(at)peak6(dot)com> wrote:

> On 03/15/2011 01:23 PM, Timothy Garnett wrote:
>
> Column | Type
>> --------------------------+------------------------+
>> id | integer |
>>
>> bl_number | character varying(16) |
>> month | date |
>> buyer_id | integer |
>> supplier_id | integer |
>>
>
> Ok. In your table description, you don't really talk about the distribution
> of bl_number. But this part of your query:
>
>
> ORDER BY month DESC LIMIT 100 OFFSET 0
>
> Is probably tricking the planner into using that index. But there's the fun
> thing about dates: we almost always want them in order of most recent to
> least recent. So you might want to try again with your
> index_customs_records_on_month_and_bl_number declared like this instead:
>
> CREATE INDEX index_customs_records_on_month_and_bl_number
> ON customs_records (month DESC, bl_number);
>
> Or, if bl_number is more selective anyway, but you need both columns for
> other queries and you want this one to ignore it:
>
> CREATE INDEX index_customs_records_on_month_and_bl_number
> ON customs_records (bl_number, month DESC);
>
> Either way, I bet you'll find that your other queries that use this index
> are also doing a backwards index scan, which will always be slower by about
> two orders of magnitude, since backwards reads act basically like random
> reads.
>
> The effect you're getting is clearly exaggerated, and I've run into it on
> occasion for effectively the entire history of PostgreSQL. Normally
> increasing the statistics on the affected columns and re-analyzing fixes it,
> but on a composite index, that won't necessarily be the case.
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
> 312-676-8870
> sthomas(at)peak6(dot)com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer.php
> for terms and conditions related to this email
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2011-03-17 18:13:53 Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3
Previous Message Craig James 2011-03-17 16:54:17 Re: Xeon twice the performance of opteron