Re: Place of subselect

From: "Guillaume Bog" <guibog(at)gmail(dot)com>
To: "A(dot) Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Place of subselect
Date: 2008-11-25 08:44:34
Message-ID: bc5951d00811250044we609343i36a714863f003097@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Nov 25, 2008 at 15:56, A. Kretschmer <
andreas(dot)kretschmer(at)schollglas(dot)com> wrote:

> am Tue, dem 25.11.2008, um 15:34:57 +0800 mailte Guillaume Bog folgendes:
> > Hi dear Postgres users.
> >
> > I have performance issues if I do the following pseudo-query:
> >
> > SELECT a, b, (SELECT count(*) FROM t2 WHERE something) AS c
> > FROM t1 ORDER BY a LIMIT 10;
> >
> > After some tests, it seems to me that the subquery on t2 is computed for
> all
> > rows of t1. As I don't "ORDER BY c", there is no need to compute c for
> every
> > row. I know I can (or should ?) work with joins or with a subquery in the
> from
> > clause, but I'd like to make sure there is no other way before changing
> my
> > sqls.
>
> Please check your presumption with explain analyse <your query>.
>
> For example:
>
> test=*# explain analyse select t1.*, (select count(1) from t2) from t1
> order by 1 limit 5;
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------
> Limit (cost=186.54..186.55 rows=5 width=4) (actual time=0.087..0.104
> rows=3 loops=1)
> InitPlan
> -> Aggregate (cost=36.75..36.76 rows=1 width=0) (actual
> time=0.022..0.024 rows=1 loops=1)
> -> Seq Scan on t2 (cost=0.00..31.40 rows=2140 width=0) (actual
> time=0.004..0.008 rows=1 loops=1)
> -> Sort (cost=149.78..155.13 rows=2140 width=4) (actual
> time=0.082..0.088 rows=3 loops=1)
> Sort Key: i
> -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=4) (actual
> time=0.046..0.056 rows=3 loops=1)
> Total runtime: 0.197 ms
> (8 rows)
>
>
> Both tables executes only one scan.
>

It seems that you are right. By further testing I found that a WHERE
condition in the subquery was making the query hundred times slower. As I'm
not very familiar with explain analyze, I paste them below. Why do I have
"merge join" and "merge cond" in one case and "subplan" in the other case?
Note that "u_xref_ug_id" is a reference and therefore b-tree indexed.

vf_cn2fr=# EXPLAIN ANALYZE SELECT ug_id AS id, ug_en AS name, ug_type AS
type,
(SELECT count(*) FROM forms_groups JOIN users ON fg_xref_u_id = u_id WHERE
u_xref_ug_id = ug_id) as groupes
FROM users_groups ORDER BY "ug_type","ug_en" LIMIT 5;

QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=513022.74..513022.76 rows=5 width=26) (actual
time=31172.258..31172.271 rows=5 loops=1)
-> Sort (cost=513022.74..513023.74 rows=397 width=26) (actual
time=31172.255..31172.259 rows=5 loops=1)
Sort Key: ug_type, ug_en
-> Seq Scan on users_groups (cost=0.00..513005.61 rows=397
width=26) (actual time=83.273..31167.266 rows=397 loops=1)
SubPlan
-> Aggregate (cost=1292.18..1292.19 rows=1 width=0)
(actual time=78.498..78.499 rows=1 loops=397)
-> Hash Join (cost=146.18..1290.52 rows=663
width=0) (actual time=30.023..78.389 rows=102 loops=397)
Hash Cond: (forms_groups.fg_xref_u_id =
users.u_id)
-> Seq Scan on forms_groups
(cost=0.00..985.88 rows=40488 width=4) (actual time=0.005..42.046 rows=40490
loops=372)
-> Hash (cost=137.10..137.10 rows=726
width=4) (actual time=0.306..0.306 rows=112 loops=397)
-> Index Scan using
users_u_xref_ug_id_idx on users (cost=0.00..137.10 rows=726 width=4)
(actual time=0.013..0.169 rows=112 loops=397)
Index Cond: (u_xref_ug_id = $0)
Total runtime: 31172.363 ms
(13 rows)

vf_cn2fr=# EXPLAIN ANALYZE SELECT ug_id AS id, ug_en AS name, ug_type AS
type,
(SELECT count(*) FROM forms_groups JOIN users ON fg_xref_u_id = u_id) as
groupes
FROM users_groups ORDER BY "ug_type","ug_en" LIMIT 5;

QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4530.86..4530.87 rows=5 width=26) (actual
time=325.353..325.365 rows=5 loops=1)
InitPlan
-> Aggregate (cost=4505.74..4505.75 rows=1 width=0) (actual
time=320.673..320.674 rows=1 loops=1)
-> Merge Join (cost=0.00..4404.52 rows=40488 width=0) (actual
time=0.176..281.602 rows=40490 loops=1)
Merge Cond: (forms_groups.fg_xref_u_id = users.u_id)
-> Index Scan using forms_groups_fg_xref_u_id_idx on
forms_groups (cost=0.00..1576.38 rows=40488 width=4) (actual
time=0.020..64.556 rows=40490 loops=1)
-> Index Scan using users_pkey on users
(cost=0.00..2212.00 rows=44313 width=4) (actual time=0.015..73.373
rows=47689 loops=1)
-> Sort (cost=25.11..26.10 rows=397 width=26) (actual
time=325.350..325.355 rows=5 loops=1)
Sort Key: ug_type, ug_en
-> Seq Scan on users_groups (cost=0.00..7.97 rows=397 width=26)
(actual time=320.693..321.192 rows=397 loops=1)
Total runtime: 325.457 ms
(11 rows)

>
>
> Andreas
> --
> Andreas Kretschmer
> Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
> GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message A. Kretschmer 2008-11-25 09:02:35 Re: Place of subselect
Previous Message Michael Hall 2008-11-25 08:09:49 Re: Serial/sequence problem