Re: : PostgreSQL Index behavior

From: Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: PGSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: : PostgreSQL Index behavior
Date: 2012-09-12 06:57:55
Message-ID: CAFrxt0iehqe4m3jympYeCMc1xrjHUan=kcEBXpXqZKe0_UzJZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank you Jeff !

My comments are inline.

> explain SELECT tv.short_code, tv.chn as pkg_subscription_chn,
> > tv.vert as pkg_vert, ubs.campaign_id as campaign,
> > 'none'::varchar as referer,
> > CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn,
> > count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue
> > FROM campaign_base ubs
> > JOIN tab_current_day_v2 tv
> > ON ubs.ubs_seq_id = tv.ubs_seq_id
> > AND tv.dt = CAST('2012-09-08' AS DATE)
> > GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id,
> vn;
> ...
> >
> > The above plan shows "seq scan" on tab_current_day_v2 table, though
> there is
> > an index on "ubs_seq_id" column which is an unique column.
> >
> > Can anyone please help us understand, why PostgreSQL optimizer is not
> > prioritizing the unique column and hitting ubs_seq_id_idx Index here ?
>
> The query where clause does not specify a constant value for
> ubs_seq_id. So it is likely that the only way to use that index would
> be to reverse the order of the nested loop and seq scan the other
> table. Is there any reason to think that doing that would be faster?
>

I believe, I missed an important point here. Yes, since the constant value
is not provided for ubs_seq_id, Index scan is not a prime preference. Makes
sense. Further analysis is explained below.

> > Later -
> >
> > We have created composite Index on "dt" (one distinct value) and
> > "ubs_seq_id" (no duplicate values) and the index has been picked up.
>
> Postgres seems to think that "dt" has no duplicate values, the
> opposite of having one distinct value.
> That is based on the estimates given in the explain plan, that teh seq
> scan will return only one row after the filter on Filter: "(dt =
> '2012-09-08'::date)". This does seem to conflict with what you
> report from pg_stats, but I'm not familiar with that view, and you
> haven't told us what version of pgsql you are using.
>

We are using PostgreSQL-9.0.1.

Yes, "dt" has one distinct value all the time is generated on daily basis.
"2012-09-08" is an non-existent value, so, Postgres seems to think there
are no duplicates.

If i pass on the value which is existing in the table "2012-09-11",
PostgreSQL optimizer is picking up "Seq Scan" (what ever Index is existing).

In our scenario, we cannot expect an Index scan to happen, because I
believe, following are the reasons -

ubs_seq_id column in campaign_base table has 1.2 m rows -- all distinct
ubs_seq_id column in tab_current_day_v2 table has 1.9 m rows -- all distinct
dt has only 1 distinct value.

All being used with AND operator, extracted rows will be minimum 1.2 m. So,
I believe, "seq scan" is the best choice PG is opting for.

I got the point. Thanks !

Regards,
Venkat

--

DISCLAIMER:

Please note that this message and any attachments may contain confidential
and proprietary material and information and are intended only for the use
of the intended recipient(s). If you are not the intended recipient, you
are hereby notified that any review, use, disclosure, dissemination,
distribution or copying of this message and any attachments is strictly
prohibited. If you have received this email in error, please immediately
notify the sender and delete this e-mail , whether electronic or printed.
Please also note that any views, opinions, conclusions or commitments
expressed in this message are those of the individual sender and do not
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2012-09-12 14:12:59 Re: : PostgreSQL Index behavior
Previous Message Tom Lane 2012-09-11 17:19:26 Re: Planner selects different execution plans depending on limit