Re: bad plan and LIMIT

From: Adam Ruth <adamruth(at)mac(dot)com>
To: "'pgsql-performance(at)postgresql(dot)org'" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: bad plan and LIMIT
Date: 2009-05-01 08:22:59
Message-ID: 583FD9F7-2250-499D-BCA7-8518A3561BD4@mac.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

You could try changing the IN to an EXISTS, that may alter how the
optimizer weighs the limit.

SELECT ID FROM ps_image WHERE EXISTS (SELECT null FROM
ps_gallery_image WHERE gallery_id ='G00007ejKGoWS_cY' and image_id =
ps_image.id) ORDER BY LOWER(FILE_NAME) ASC

On 30/04/2009, at 3:51 AM, James Nelson wrote:

>
> Hi, I'm hoping you guys can help with improving this query I'm
> having a problem with. The main problem is that the query plan
> changes depending on the value of the LIMIT clause, with small
> values using a poor plan and running very slowly. The two times are
> roughly 5 minutes for the bad plan and 1.5 secs for the good plan.
>
> I have read a little about how the query planner takes into account
> the limit clause, and I can see the effect this has on the costs
> shown by explain. The problem is that the estimated cost ends up
> being wildly inaccurate. I'm not sure if this a problem with the
> planner or if it is something I am doing wrong on my end.
>
> the query (without the limit clause):
>
> SELECT ID FROM ps_image WHERE id IN (SELECT image_id FROM
> ps_gallery_image WHERE gallery_id='G00007ejKGoWS_cY') ORDER BY
> LOWER(FILE_NAME) ASC
>
> The ps_image table has about 24 million rows, ps_gallery_image has
> about 14 million. The query above produces roughly 50 thousand rows.
>
> When looking at the explain with the limit, I can see the
> interpolation that the planner does for the limit node (arriving at
> a final cost of 458.32 for this example) but not sure why it is
> inaccurate compared to the actual times.
>
> Thanks in advance for taking a look at this, let me know if there is
> additional information I should provide.
>
> Some information about the tables and the explains follow below.
>
> James Nelson
>
> [james(at)db2 ~] psql --version
> psql (PostgreSQL) 8.3.5
> contains support for command-line editing
>
> photoshelter=# \d ps_image
> Table "public.ps_image"
> Column | Type | Modifiers
> ---------------+--------------------------
> +-------------------------------------------
> id | character varying(16) | not null
> user_id | character varying(16) |
> album_id | character varying(16) | not null
> parent_id | character varying(16) |
> file_name | character varying(200) |
> file_size | bigint |
> .... 20 rows snipped ....
> Indexes:
> "ps_image_pkey" PRIMARY KEY, btree (id)
> "i_file_name_l" btree (lower(file_name::text))
> .... indexes, fk constraints and triggers snipped ....
>
> photoshelter=# \d ps_gallery_image
> Table "public.ps_gallery_image"
> Column | Type | Modifiers
> ---------------+--------------------------+------------------------
> gallery_id | character varying(16) | not null
> image_id | character varying(16) | not null
> display_order | integer | not null default 0
> caption | character varying(2000) |
> ctime | timestamp with time zone | not null default now()
> mtime | timestamp with time zone | not null default now()
> id | character varying(16) | not null
> Indexes:
> "ps_gallery_image_pkey" PRIMARY KEY, btree (id)
> "gi_gallery_id" btree (gallery_id)
> "gi_image_id" btree (image_id)
> Foreign-key constraints:
> "ps_gallery_image_gallery_id_fkey" FOREIGN KEY (gallery_id)
> REFERENCES ps_gallery(id) ON DELETE CASCADE
> "ps_gallery_image_image_id_fkey" FOREIGN KEY (image_id) REFERENCES
> ps_image(id) ON DELETE CASCADE
> Triggers:
> ps_image_gi_sync AFTER INSERT OR DELETE OR UPDATE ON
> ps_gallery_image FOR EACH ROW EXECUTE PROCEDURE ps_image_sync()
>
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> explain analyze for bad plan
>
> photoshelter=# explain analyze SELECT ID FROM ps_image WHERE id IN
> (SELECT image_id FROM ps_gallery_image WHERE
> gallery_id='G00007ejKGoWS_cY') ORDER BY LOWER(FILE_NAME) ASC limit 1;
> QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..458.32 rows=1 width=36) (actual
> time=709831.847..709831.847 rows=1 loops=1)
> -> Nested Loop IN Join (cost=0.00..17700128.78 rows=38620
> width=36) (actual time=709831.845..709831.845 rows=1 loops=1)
> -> Index Scan using i_file_name_l on ps_image
> (cost=0.00..1023863.22 rows=24460418 width=36) (actual
> time=0.063..271167.293 rows=8876340 loops=1)
> -> Index Scan using gi_image_id on ps_gallery_image
> (cost=0.00..0.85 rows=1 width=17) (actual time=0.048..0.048 rows=0
> loops=8876340)
> Index Cond: ((ps_gallery_image.image_id)::text =
> (ps_image.id)::text)
> Filter: ((ps_gallery_image.gallery_id)::text =
> 'G00007ejKGoWS_cY'::text)
> Total runtime: 709831.932 ms
>
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> ======================================================================
> explain analyze for good plan
>
> photoshelter=# explain analyze SELECT ID FROM ps_image WHERE id IN
> (SELECT image_id FROM ps_gallery_image WHERE
> gallery_id='G00007ejKGoWS_cY') ORDER BY LOWER(FILE_NAME) ASC limit
> 600;
> QUERY
> PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=154650.99..154652.49 rows=600 width=36) (actual
> time=1886.038..1886.404 rows=600 loops=1)
> -> Sort (cost=154650.99..154747.54 rows=38619 width=36) (actual
> time=1886.038..1886.174 rows=600 loops=1)
> Sort Key: (lower((ps_image.file_name)::text))
> Sort Method: top-N heapsort Memory: 75kB
> -> Nested Loop (cost=42394.02..152675.86 rows=38619
> width=36) (actual time=135.132..1838.491 rows=50237 loops=1)
> -> HashAggregate (cost=42394.02..42780.21 rows=38619
> width=17) (actual time=135.079..172.563 rows=50237 loops=1)
> -> Index Scan using gi_gallery_id on
> ps_gallery_image (cost=0.00..42271.79 rows=48891 width=17) (actual
> time=0.063..97.539 rows=50237 loops=1)
> Index Cond: ((gallery_id)::text =
> 'G00007ejKGoWS_cY'::text)
> -> Index Scan using ps_image_pkey on ps_image
> (cost=0.00..2.83 rows=1 width=36) (actual time=0.031..0.031 rows=1
> loops=50237)
> Index Cond: ((ps_image.id)::text =
> (ps_gallery_image.image_id)::text)
> Total runtime: 1886.950 ms
>
>
>
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Grzegorz Jaśkiewicz 2009-05-01 08:32:47 Re: bad plan and LIMIT
Previous Message Whit Armstrong 2009-04-30 01:15:45 Re: partition question for new server setup