: PostgreSQL Index behavior

From: Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To: PGSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: : PostgreSQL Index behavior
Date: 2012-09-10 12:39:35
Message-ID: CAFrxt0go0hb7bJqPjxicZc6G6gJeRHmYM7rJowE1vzZd7t3BkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Community,

I intend to understand further on PostgreSQL Index behavior on a "SELECT"
statement.

We have a situation where-in Index on unique column is not being picked up
as expected when used with-in the WHERE clause with other non-unique
columns using AND operator.

explain SELECT tv.short_code, tv.chn as pkg_subscription_chn,
tv.vert as pkg_vert, ubs.campaign_id as campaign,
'none'::varchar as referer,
CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn,
count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue
FROM campaign_base ubs
JOIN tab_current_day_v2 tv
ON ubs.ubs_seq_id = tv.ubs_seq_id
AND tv.dt = CAST('2012-09-08' AS DATE)
GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id, vn;

QUERY PLAN
----------------------------------------------------------------------------------------------------
HashAggregate (cost=77754.57..77754.58 rows=1 width=38)
-> Nested Loop (cost=0.00..77754.56 rows=1 width=38)
-> Seq Scan on tab_current_day_v2 tv (cost=0.00..77746.26 rows=1
width=39)
Filter: (dt = '2012-09-08'::date)
-> Index Scan using cb_ubs_id_idx on campaign_base ubs
(cost=0.00..8.28 rows=1 width=15)
Index Cond: (ubs.ubs_seq_id = tv.ubs_seq_id)
(6 rows)

The above plan shows "seq scan" on tab_current_day_v2 table, though there
is an index on "ubs_seq_id" column which is an unique column.

Can anyone please help us understand, why PostgreSQL optimizer is not
prioritizing the unique column and hitting ubs_seq_id_idx Index here ?

Later -

We have created composite Index on "dt" (one distinct value) and
"ubs_seq_id" (no duplicate values) and the index has been picked up.

Below is the scenario where-in the same query's plan picking up the
composite Index.

prod-db=# create index concurrently tab_dt_ubs_seq_id_idx on
tab_current_day_v2(dt,ubs_seq_id);
CREATE INDEX

prod-db=# explain SELECT tv.short_code, tv.chn as pkg_subscription_chn,
tv.vert as pkg_vert, ubs.campaign_id as campaign,
'none'::varchar as referer,
CAST('CAMPAIGNWISE_SUBSCRIBER_BASE' AS VARCHAR) as vn,
count(tv.msisdn) as n_count, '0'::numeric AS tot_revenue
FROM campaign_base ubs
JOIN tab_current_day_v2 tv
ON ubs.ubs_seq_id = tv.ubs_seq_id
AND tv.dt = CAST('2012-09-08' AS DATE)
GROUP BY tv.short_code, tv.vert, tv.chn, ubs.campaign_id, vn;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=16.88..16.89 rows=1 width=38)
-> Nested Loop (cost=0.00..16.86 rows=1 width=38)
-> Index Scan using tab_dt_ubs_seq_id_idx on tab_current_day_v2
tv (cost=0.00..8.57 rows=1 width=39)
Index Cond: (dt = '2012-09-08'::date)
-> Index Scan using cb_ubs_id_idx on campaign_base ubs
(cost=0.00..8.28 rows=1 width=15)
Index Cond: (ubs.ubs_seq_id = tv.ubs_seq_id)
(6 rows)

I was expecting the above behavior without a composite Index. A column with
most unique values must be picked up when multiple columns are used in
WHERE clause using AND operator. Any thoughts ?

prod-db# \d tab_current_day_v2

Table "public.tab_current_day_v2"
Column | Type | Modifiers
--------------------------+--------------------------+-----------
dt | date |
chn | character varying(10) |
vert | character varying(20) |
isdn | character varying |
bc | character varying(40) |
status | text |
is_rene | boolean |
age_in_sys | integer |
age_in_grace | integer |
has_prof | boolean |
short_code | character varying |
sub_vert | character varying(30) |
mode | character varying |
ubs_seq_id | bigint |
pkg_name | character varying(200) |
pkg_id | integer |
subs_charge | money |
subs_time | timestamp with time zone |
ulq_seq_id | bigint |
valid_till_time | timestamp with time zone |
valid_from_time | timestamp with time zone |
latest_ube_seq_id | bigint |
latest_pkg_id | integer |
price | integer |
Indexes:
"tab_dt_ubs_seq_id_idx" btree (dt, ubs_seq_id)
"tab_isdn_idx" btree (msisdn)
"tab_status_idx" btree (status)
"ubs_seq_id_idx" btree (ubs_seq_id)

Below is the table structure and the uniqueness of each of the columns.

airtel_user_data_oltp=# select attname, n_distinct from pg_Stats where
tablename='tab_current_day_v2';

attname | n_distinct
--------------------------+------------
dt | 1
chn | 7
vert | 94
isdn | -0.727331
bc | 4
status | 3
is_rene | 2
age_in_sys | 1018
age_in_grac | 369
has_prof | 2
short_code | 23
sub_vert | 5
mode | 0
ubs_seq_id | -1
pkg_name | 461
pkg_id | 461
subs_charge | 7
subs_time | -1
ulq_seq_id | 122887
valid_till_time | -0.966585
valid_from_time | -0.962563
latest_ube_seq_id | -1
latest_pkg_id | 475
price | 18

(24 rows)

This is not an issue, but, would like to understand how PostgreSQL
optimizer picks up Indexes in SELECT queries.

In an other scenario, we had used 4 columns in WHERE clause with AND
operator with an Index on the column with most unique values -- The Index
was picked up.

Looking forward for your help !

Regards,
VB

--

DISCLAIMER:

Please note that this message and any attachments may contain confidential
and proprietary material and information and are intended only for the use
of the intended recipient(s). If you are not the intended recipient, you
are hereby notified that any review, use, disclosure, dissemination,
distribution or copying of this message and any attachments is strictly
prohibited. If you have received this email in error, please immediately
notify the sender and delete this e-mail , whether electronic or printed.
Please also note that any views, opinions, conclusions or commitments
expressed in this message are those of the individual sender and do not
necessarily reflect the views of *Ver sé Innovation Pvt Ltd*.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message bill_martin 2012-09-10 14:24:30 Planner selects different execution plans depending on limit
Previous Message Ireneusz Pluta 2012-09-10 09:01:28 Re: libpq or postgresql performance