Re: Wrong query plan when using a left outer join

From: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>
To: Feike Steenbergen <feikesteenbergen(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Wrong query plan when using a left outer join
Date: 2012-01-17 11:53:18
Message-ID: CAP_rwwm_WCQN5n+kcKOZV7RSN59rn_MKNJxLfia4t4Ae4aTaow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Tue, Jan 17, 2012 at 7:54 AM, Feike Steenbergen
<feikesteenbergen(at)gmail(dot)com> wrote:
> I have the following setup:
>
> A table called hand:
>
>
>                                        Table "stage.hand_meta"
>    Column     |           Type           |
> Modifiers
> ---------------+--------------------------+-------------------------------------------------------------
>  hand_id       | integer                  | not null default
> nextval('hand_meta_hand_id_seq'::regclass)
>  hand_no       | bigint                   | not null
>  site_id       | smallint                 | not null
>  game_id       | smallint                 | not null
>  time          | timestamp with time zone | not null
>  tournament_id | bigint                   |
> Indexes:
>    "hand_meta_pkey" PRIMARY KEY, btree (hand_id) CLUSTER
>    "hand_meta_hand_no_site_unq" UNIQUE, btree (hand_no, site_id)
>    "hand_meta_time_idx" btree ("time")
>    "hand_meta_tournament_id_idx" btree (tournament_id)
> Referenced by:
>    TABLE "handhistory_plain" CONSTRAINT
> "handhistory_plain_hand_id_fkey" FOREIGN KEY (hand_id) REFERENCES
> hand_meta(hand_id)
>    TABLE "handhistory_staged" CONSTRAINT "staged_hand_hand_id_fkey"
> FOREIGN KEY (hand_id) REFERENCES hand_meta(hand_id)
>
> Getting the max hand_id (primary key) results in using an index:
>
>
> feiketracker=> explain analyze select max(hand_id) from stage.hand;
>
>  QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>  Result  (cost=0.03..0.04 rows=1 width=0) (actual time=0.379..0.383
> rows=1 loops=1)
>   InitPlan 1 (returns $0)
>     ->  Limit  (cost=0.00..0.03 rows=1 width=4) (actual
> time=0.337..0.340 rows=1 loops=1)
>           ->  Index Scan Backward using hand_meta_pkey on hand_meta
> (cost=0.00..82667.12 rows=2479440 width=4) (actual time=0.319..0.319
> rows=1 loops=1)
>                 Index Cond: (hand_id IS NOT NULL)
>  Total runtime: 0.823 ms
> (6 rows)
>
>
> Now, if i create a view which left outer joins another table and
> select max hand_id it uses a seq_scan, which I think it should'nt use,
> as it only needs to query hand_meta and then use the index:
>
>
> feiketracker=> create view seqscan_example as (select * from hand_meta
> left join handhistory_plain using(hand_id));
> CREATE VIEW
> Time: 72.736 ms
>
> feiketracker=> explain analyze select max(hand_id) from seqscan_example;
>                                                         QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=49261.00..49261.01 rows=1 width=4) (actual
> time=34672.052..34672.054 rows=1 loops=1)
>   ->  Seq Scan on hand_meta  (cost=0.00..43062.40 rows=2479440
> width=4) (actual time=0.180..16725.109 rows=2479440 loops=1)
>  Total runtime: 34672.874 ms
> (3 rows)
>
>
> feiketracker=> select version();
>                                                              version
> ------------------------------------------------------------------------------------------------------------------------------------
>  PostgreSQL 9.0.6 on armv5tejl-unknown-linux-gnueabi, compiled by GCC
> gcc (GCC) 3.4.4 (release) (CodeSourcery ARM 2005q3-2), 32-bit
> (1 row)
>
>
> I cannot think of a reason to use a seqscan, the left join should
> indicate all results from hand_meta should be used, hand_id is the
> primary key, so selecting max(hand_id) from the table or the view
> should result in the same execution plan or am I thinking wrong?
>

it's not always so simple for the planner to eliminate left join...
imagine that the view on the right side of join has some side effects.

so postgres will never "cut off" the right join side. but postgres
will still try to choose best execution plan. seq scan may simply be
faster here. breaking point is somewhere near 50% selectivity.

when handhistory_plain starts geting much bigger, plan will change.

try to experiment with SET enable_seqscan TO false; - and see what happens.

BTW, add a foreign key and index on handhistory_plain.hand_id (unless
you have it already).
BTW2, if you really don't care on handhistory you can just use
original query with no join.

Filip

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Andreas Kretschmer 2012-01-17 13:35:19 Re: sql query problem
Previous Message Thomas Kellerer 2012-01-17 08:49:44 Re: UPDATE COMPATIBILITY