Re: potential performance gain by query planner optimization

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "Kneringer, Armin" <Armin(dot)Kneringer(at)fabasoft(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: potential performance gain by query planner optimization
Date: 2010-07-20 19:39:04
Message-ID: AANLkTilwPLJssKY8TMc5fjYTTOPQsVsHSEgrjj_46MSM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello

2010/7/20 Kneringer, Armin <Armin(dot)Kneringer(at)fabasoft(dot)com>:
> Hi there.
>
> I think I found a potential performance gain if the query planner would be optimized. All Tests has been performed with 8.4.1 (and earlier versions) on CentOS 5.3 (x64)
>
> The following query will run on my database (~250 GB) for ca. 1600 seconds and the sort will result in a disk merge deploying ca. 200 GB of data to the local disk (ca. 180.000 tmp-files)

can you try show check explain with set enable_hashjoin to off; ?

Regards

Pavel Stehule

>
> explain SELECT DISTINCT t4.objid
> FROM fscsubfile t4, cooobject t6
>  NOT EXISTS (
>  WHERE t6.objid = t4.objid AND
>  t4.fileresporgid = 573936067464397682 AND
>   NOT EXISTS (
>   SELECT 1
>   FROM ataggval q1_1,
>   atdateval t5
>   WHERE q1_1.objid = t4.objid AND
>   q1_1.attrid = 281479288456451 AND
>   q1_1.aggrid = 0 AND
>   t5.aggrid = q1_1.aggval AND
>   t5.objid = t4.objid AND
>   t5.attrid = 281479288456447 ) AND
>  ((t6.objclassid IN (285774255832590,285774255764301))) AND
>  ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
>  ORDER BY t4.objid;
>
>                                                                                  QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Unique  (cost=2592137103.99..2592137104.00 rows=1 width=8)
>   ->  Sort  (cost=2592137103.99..2592137104.00 rows=1 width=8)
>         Sort Key: t4.objid
>         ->  Nested Loop  (cost=1105592553.38..2592137103.98 rows=1 width=8)
>               ->  Hash Anti Join  (cost=1105592553.38..2592137095.75 rows=1 width=8)
>                     Hash Cond: ((t4.objid = q1_1.objid) AND (t4.objid = t5.objid))
>                     ->  Bitmap Heap Scan on fscsubfile t4  (cost=154.42...14136.40 rows=5486 width=8)
>                           Recheck Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid > 573936097512390656::bigint) AND (objid < 573936101807357952::bigint))
>                           ->  Bitmap Index Scan on ind_fscsubfile_filerespons  (cost=0.00..153.05 rows=5486 width=0)
>                                 Index Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid > 573936097512390656::bigint) AND (objid < 573936101807357952::bigint))
>                     ->  Hash  (cost=11917516.57..11917516.57 rows=55006045159 width=16)
>                           ->  Nested Loop  (cost=0.00..11917516.57 rows=55006045159 width=16)
>                                 ->  Seq Scan on atdateval t5  (cost=0.00...294152.40 rows=1859934 width=12)
>                                       Filter: (attrid = 281479288456447::bigint)
>                                 ->  Index Scan using ind_ataggval on ataggval q1_1  (cost=0.00..6.20 rows=4 width=12)
>                                       Index Cond: ((q1_1.attrid = 281479288456451::bigint) AND (q1_1.aggval = t5.aggrid))
>                                       Filter: (q1_1.aggrid = 0)
>               ->  Index Scan using cooobjectix on cooobject t6  (cost=0.00..8.22 rows=1 width=8)
>                     Index Cond: (t6.objid = t4.objid)
>                     Filter: (t6.objclassid = ANY ('{285774255832590,285774255764301}'::bigint[]))
> (20 rows)
>
>
> As the disks pace is limited on my test system I can't provide the "explain analyze" output
> If I change the query as follows the query takes only 12 seconds and only needs 2 tmp files for sorting.
> (Changed lines are marked with [!!!!!] as I don't know HTML-Mails will be delivered without conversion
>
> explain SELECT DISTINCT t4.objid
> FROM fscsubfile t4, cooobject t6
> WHERE t6.objid = t4.objid AND
> t4.fileresporgid = 573936067464397682 AND
>   NOT EXISTS (
>   SELECT 1
>   FROM ataggval q1_1,
>   atdateval t5
>   WHERE q1_1.objid = t4.objid AND
>   q1_1.attrid = 281479288456451 AND
>   q1_1.aggrid = 0 AND
>   t5.aggrid = q1_1.aggval AND
>   t5.objid = q1_1.objid AND                 [!!!!!]
>   t5.attrid = 281479288456447 ) AND
>   ((t6.objclassid IN (285774255832590,285774255764301))) AND
>   ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
>  ORDER BY t4.objid;
>                                                                            QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Unique  (cost=918320.29..971968.88 rows=1 width=8)
>   ->  Nested Loop  (cost=918320.29..971968.88 rows=1 width=8)
>         ->  Merge Anti Join  (cost=918320.29..971960.65 rows=1 width=8)
>               Merge Cond: (t4.objid = q1_1.objid)
>               ->  Index Scan using ind_fscsubfile_filerespons on fscsubfile t4  (cost=0.00..19016.05 rows=5486 width=8)
>                     Index Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid > 573936097512390656::bigint) AND (objid < 573936101807357952::bigint))
>               ->  Materialize  (cost=912418.42..956599.36 rows=22689 width=8)
>                     ->  Merge Join  (cost=912418.42..956372.47 rows=22689 width=8)
>                           Merge Cond: ((t5.objid = q1_1.objid) AND (t5.aggrid = q1_1.aggval))
>                           ->  Sort  (cost=402024.80..406674.63 rows=1859934 width=12)
>                                 Sort Key: t5.objid, t5.aggrid
>                                 ->  Bitmap Heap Scan on atdateval t5  (cost=43749.07..176555.24 rows=1859934 width=12)
>                                       Recheck Cond: (attrid = 281479288456447::bigint)
>                                       ->  Bitmap Index Scan on ind_atdateval  (cost=0.00..43284.08 rows=1859934 width=0)
>                                             Index Cond: (attrid = 281479288456447::bigint)
>                           ->  Materialize  (cost=510392.25..531663.97 rows=1701738 width=12)
>                                 ->  Sort  (cost=510392.25..514646.59 rows=1701738 width=12)
>                                       Sort Key: q1_1.objid, q1_1.aggval
>                                       ->  Bitmap Heap Scan on ataggval q1_1  (cost=44666.00..305189.47 rows=1701738 width=12)
>                                             Recheck Cond: (attrid = 281479288456451::bigint)
>                                             Filter: (aggrid = 0)
>                                             ->  Bitmap Index Scan on ind_ataggval  (cost=0.00..44240.56 rows=1860698 width=0)
>                                                   Index Cond: (attrid = 281479288456451::bigint)
>         ->  Index Scan using cooobjectix on cooobject t6  (cost=0.00..8.22 rows=1 width=8)
>               Index Cond: (t6.objid = t4.objid)
>               Filter: (t6.objclassid = ANY ('{285774255832590,285774255764301}'::bigint[]))
> (26 rows)
>
> explain analyze SELECT DISTINCT t4.objid
> FROM fscsubfile t4, cooobject t6
> WHERE t6.objid = t4.objid AND
> t4.fileresporgid = 573936067464397682 AND
>  NOT EXISTS (
>  SELECT 1
>  FROM ataggval q1_1,
>  atdateval t5
>  WHERE q1_1.objid = t4.objid AND
>  q1_1.attrid = 281479288456451 AND
>  q1_1.aggrid = 0 AND
>  t5.aggrid = q1_1.aggval AND
>  t5.objid = q1_1.objid AND                 [!!!!!]
>  t5.attrid = 281479288456447 ) AND
> ((t6.objclassid IN (285774255832590,285774255764301))) AND
> ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
> ORDER BY t4.objid;
>                                                                                     QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Unique  (cost=918320.29..971968.88 rows=1 width=8) (actual time=12079.598..12083.048 rows=64 loops=1)
>   ->  Nested Loop  (cost=918320.29..971968.88 rows=1 width=8) (actual time=12079.594..12083.010 rows=64 loops=1)
>         ->  Merge Anti Join  (cost=918320.29..971960.65 rows=1 width=8) (actual time=12037.524..12081.989 rows=108 loops=1)
>               Merge Cond: (t4.objid = q1_1.objid)
>               ->  Index Scan using ind_fscsubfile_filerespons on fscsubfile t4  (cost=0.00..19016.05 rows=5486 width=8) (actual time=0.073..83.498 rows=63436 loops=1)
>                     Index Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid > 573936097512390656::bigint) AND (objid < 573936101807357952::bigint))
>               ->  Materialize  (cost=912418.42..956599.36 rows=22689 width=8) (actual time=8866.253..11753.055 rows=1299685 loops=1)
>                     ->  Merge Join  (cost=912418.42..956372.47 rows=22689 width=8) (actual time=8866.246..11413.397 rows=1299685 loops=1)
>                           Merge Cond: ((t5.objid = q1_1.objid) AND (t5.aggrid = q1_1.aggval))
>                           ->  Sort  (cost=402024.80..406674.63 rows=1859934 width=12) (actual time=3133.362..3774.076 rows=1299685 loops=1)
>                                 Sort Key: t5.objid, t5.aggrid
>                                 Sort Method:  external merge  Disk: 47192kB
>                                 ->  Bitmap Heap Scan on atdateval t5  (cost=43749.07..176555.24 rows=1859934 width=12) (actual time=282.454..1079.038 rows=1857906 loops=1)
>                                       Recheck Cond: (attrid = 281479288456447::bigint)
>                                       ->  Bitmap Index Scan on ind_atdateval  (cost=0.00..43284.08 rows=1859934 width=0) (actual time=258.749...258.749 rows=1857906 loops=1)
>                                             Index Cond: (attrid = 281479288456447::bigint)
>                           ->  Materialize  (cost=510392.25..531663.97 rows=1701738 width=12) (actual time=5732.872..6683.784 rows=1299685 loops=1)
>                                 ->  Sort  (cost=510392.25..514646.59 rows=1701738 width=12) (actual time=5732.866..6387.188 rows=1299685 loops=1)
>                                      Sort Key: q1_1.objid, q1_1.aggval
>                                       Sort Method:  external merge  Disk: 39920kB
>                                       ->  Bitmap Heap Scan on ataggval q1_1  (cost=44666.00..305189.47 rows=1701738 width=12) (actual time=1644.983..3634.044 rows=1857906 loops=1)
>                                             Recheck Cond: (attrid = 281479288456451::bigint)
>                                             Filter: (aggrid = 0)
>                                             ->  Bitmap Index Scan on ind_ataggval  (cost=0.00..44240.56 rows=1860698 width=0) (actual time=1606.325..1606.325 rows=1877336 loops=1)
>                                                   Index Cond: (attrid = 281479288456451::bigint)
>         ->  Index Scan using cooobjectix on cooobject t6  (cost=0.00..8.22 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=108)
>               Index Cond: (t6.objid = t4.objid)
>               Filter: (t6.objclassid = ANY ('{285774255832590,285774255764301}'::bigint[]))
> Total runtime: 12108.663 ms
> (29 rows)
>
>
> Another way to optimize my query is to change it as follows:
> (Once again changes are marked with [!!!!!]
>
> explain SELECT DISTINCT t4.objid
> FROM fscsubfile t4, cooobject t6
> WHERE t6.objid = t4.objid AND
> t4.fileresporgid = 573936067464397682 AND
>   NOT EXISTS (
>   SELECT 1
>   FROM ataggval q1_1,
>   atdateval t5
>   WHERE q1_1.objid = t5.objid AND                 [!!!!!]
>   q1_1.attrid = 281479288456451 AND
>   q1_1.aggrid = 0 AND
>   t5.aggrid = q1_1.aggval AND
>   t5.objid = t4.objid AND
>   t5.attrid = 281479288456447 ) AND
>  ((t6.objclassid IN (285774255832590,285774255764301))) AND
>  ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
>  ORDER BY t4.objid;
>                                                                            QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Unique  (cost=916978.86..969139.72 rows=1 width=8)
>   ->  Nested Loop  (cost=916978.86..969139.72 rows=1 width=8)
>         ->  Merge Anti Join  (cost=916978.86..969131.49 rows=1 width=8)
>               Merge Cond: (t4.objid = t5.objid)
>               ->  Index Scan using ind_fscsubfile_filerespons on fscsubfile t4  (cost=0.00..19016.05 rows=5486 width=8)
>                     Index Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid > 573936097512390656::bigint) AND (objid < 573936101807357952::bigint))
>               ->  Materialize  (cost=912418.42..956599.36 rows=22689 width=8)
>                     ->  Merge Join  (cost=912418.42..956372.47 rows=22689 width=8)
>                           Merge Cond: ((t5.objid = q1_1.objid) AND (t5.aggrid = q1_1.aggval))
>                           ->  Sort  (cost=402024.80..406674.63 rows=1859934 width=12)
>                                 Sort Key: t5.objid, t5.aggrid
>                                 ->  Bitmap Heap Scan on atdateval t5  (cost=43749.07..176555.24 rows=1859934 width=12)
>                                       Recheck Cond: (attrid = 281479288456447::bigint)
>                                       ->  Bitmap Index Scan on ind_atdateval  (cost=0.00..43284.08 rows=1859934 width=0)
>                                             Index Cond: (attrid = 281479288456447::bigint)
>                           ->  Materialize  (cost=510392.25..531663.97 rows=1701738 width=12)
>                                 ->  Sort  (cost=510392.25..514646.59 rows=1701738 width=12)
>                                       Sort Key: q1_1.objid, q1_1.aggval
>                                       ->  Bitmap Heap Scan on ataggval q1_1  (cost=44666.00..305189.47 rows=1701738 width=12)
>                                             Recheck Cond: (attrid = 281479288456451::bigint)
>                                             Filter: (aggrid = 0)
>                                             ->  Bitmap Index Scan on ind_ataggval  (cost=0.00..44240.56 rows=1860698 width=0)
>                                                   Index Cond: (attrid = 281479288456451::bigint)
>         ->  Index Scan using cooobjectix on cooobject t6  (cost=0.00..8.22 rows=1 width=8)
>               Index Cond: (t6.objid = t4.objid)
>               Filter: (t6.objclassid = ANY ('{285774255832590,285774255764301}'::bigint[]))
> (26 rows)
>
>
> explain analyze SELECT DISTINCT t4.objid
> FROM fscsubfile t4, cooobject t6
> WHERE t6.objid = t4.objid AND
> t4.fileresporgid = 573936067464397682 AND
>  NOT EXISTS (
>  SELECT 1
>  FROM ataggval q1_1,
>  atdateval t5
>  WHERE q1_1.objid = t5.objid AND                 [!!!!!]
>  q1_1.attrid = 281479288456451 AND
>  q1_1.aggrid = 0 AND
>  t5.aggrid = q1_1.aggval AND
>  t5.objid = t4.objid AND
>  t5.attrid = 281479288456447 ) AND
> ((t6.objclassid IN (285774255832590,285774255764301))) AND
> ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
> ORDER BY t4.objid;
>                                                                                     QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Unique  (cost=916978.86..969139.72 rows=1 width=8) (actual time=12102.964..12106.409 rows=64 loops=1)
>   ->  Nested Loop  (cost=916978.86..969139.72 rows=1 width=8) (actual time=12102.959..12106.375 rows=64 loops=1)
>         ->  Merge Anti Join  (cost=916978.86..969131.49 rows=1 width=8) (actual time=12060.916..12105.374 rows=108 loops=1)
>               Merge Cond: (t4.objid = t5.objid)
>               ->  Index Scan using ind_fscsubfile_filerespons on fscsubfile t4  (cost=0.00..19016.05 rows=5486 width=8) (actual time=0.080..81.397 rows=63436 loops=1)
>                     Index Cond: ((fileresporgid = 573936067464397682::bigint) AND (objid > 573936097512390656::bigint) AND (objid < 573936101807357952::bigint))
>               ->  Materialize  (cost=912418.42..956599.36 rows=22689 width=8) (actual time=8874.492..11778.254 rows=1299685 loops=1)
>                     ->  Merge Join  (cost=912418.42..956372.47 rows=22689 width=8) (actual time=8874.484..11437.175 rows=1299685 loops=1)
>                           Merge Cond: ((t5.objid = q1_1.objid) AND (t5.aggrid = q1_1.aggval))
>                           ->  Sort  (cost=402024.80..406674.63 rows=1859934 width=12) (actual time=3117.555..3756.062 rows=1299685 loops=1)
>                                 Sort Key: t5.objid, t5.aggrid
>                                 Sort Method:  external merge  Disk: 39920kB
>                                 ->  Bitmap Heap Scan on atdateval t5  (cost=43749.07..176555.24 rows=1859934 width=12) (actual time=289.475..1079.624 rows=1857906 loops=1)
>                                       Recheck Cond: (attrid = 281479288456447::bigint)
>                                       ->  Bitmap Index Scan on ind_atdateval  (cost=0.00..43284.08 rows=1859934 width=0) (actual time=265.720...265.720 rows=1857906 loops=1)
>                                             Index Cond: (attrid = 281479288456447::bigint)
>                           ->  Materialize  (cost=510392.25..531663.97 rows=1701738 width=12) (actual time=5756.915..6707.864 rows=1299685 loops=1)
>                                 ->  Sort  (cost=510392.25..514646.59 rows=1701738 width=12) (actual time=5756.909..6409.819 rows=1299685 loops=1)
>                                       Sort Key: q1_1.objid, q1_1.aggval
>                                       Sort Method:  external merge  Disk: 39920kB
>                                       ->  Bitmap Heap Scan on ataggval q1_1  (cost=44666.00..305189.47 rows=1701738 width=12) (actual time=1646.955..3628.918 rows=1857906 loops=1)
>                                             Recheck Cond: (attrid = 281479288456451::bigint)
>                                             Filter: (aggrid = 0)
>                                             ->  Bitmap Index Scan on ind_ataggval  (cost=0.00..44240.56 rows=1860698 width=0) (actual time=1608.233..1608.233 rows=1877336 loops=1)
>                                                   Index Cond: (attrid = 281479288456451::bigint)
>         ->  Index Scan using cooobjectix on cooobject t6  (cost=0.00..8.22 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=108)
>               Index Cond: (t6.objid = t4.objid)
>               Filter: (t6.objclassid = ANY ('{285774255832590,285774255764301}'::bigint[]))
> Total runtime: 12129.613 ms
> (29 rows)
>
>
>
> As the query performs in roughly 12 seconds in both (changed) cases you might advise to change my queries :-)
> (In fact we are working on this)
> As the primary performance impact can also be reproduced in a small database (querytime > 1 minute) I checked this issue on MS-SQL server and Oracle. On MSSQL server there is no difference in the execution plan if you change the query an the performance is well. Oralce shows a slightly difference but the performance is also well.
> As I mentioned we are looking forward to change our query but in my opinion there could be a general performance gain if this issue is addressed. (especially if you don't know you run into this issue on the query performance is sufficient enough)
>
> greets
> Armin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2010-07-20 22:58:07 Re: IDE x SAS RAID 0 on HP DL 380 G5 P400i controller performance problem
Previous Message Kneringer, Armin 2010-07-20 16:25:55 potential performance gain by query planner optimization