Re: performance change from 8.3.1 to later releases

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Roger Ging <rging(at)musicreports(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance change from 8.3.1 to later releases
Date: 2010-04-20 19:33:32
Message-ID: x2gdcc563d11004201233yea1a2e46i9c97b3a9f6269dea@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Apr 20, 2010 at 12:38 PM, Roger Ging <rging(at)musicreports(dot)com> wrote:
> Hi,
>
> I have access to servers running 8.3.1, 8.3.8, 8.4.2 and 8.4.3.  I have
> noticed that on the 8.4.* versions, a lot of our code is either taking much
> longer to complete, or never completing.  I think I have isolated the
> problem to queries using in(), not in() or not exists().  I've put together
> a test case with one particular query that demonstrates the problem.
>
> select count(*) from traderhank.vendor_catalog = 147,352
>
> select count(*) from traderhank.xc_products = 8,610
>
> The sub query (select vc1.th_sku from traderhank.vendor_catalog vc1
> join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
> and vc2.th_sku != vc1.th_sku
> where vc1.cutoff_date is null and vc2.cutoff_date is null
> group by vc1.th_sku
> )  yields 54,390 rows
>
> The sub query (select vc_th_Sku from traderhank.xc_products where vc_th_sku
> is not null) yields 5,132 rows
>
> These 2 tables have been loaded from a pg_dump on all servers, vacuum
> analyze run after load.
>
> 1st case: pg 8.3.1 using left join finishes the query in about 3.5 seconds
>
> explain analyze
> select vc.* from traderhank.vendor_catalog vc
> left join
> (
> select vc1.th_sku from traderhank.vendor_catalog vc1
> join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
> and vc2.th_sku != vc1.th_sku
> where vc1.cutoff_date is null and vc2.cutoff_date is null
> group by vc1.th_sku
> ) vcj on vcj.th_sku = vc.th_sku
> left join traderhank.xc_products xc on xc.vc_th_sku = vc.th_sku
> where vcj.th_sku is null
> and xc.vc_th_sku is null
>
> "Merge Left Join  (cost=71001.53..72899.35 rows=36838 width=310) (actual
> time=9190.446..10703.509 rows=78426 loops=1)"
> "  Merge Cond: ((vc.th_sku)::text = (vc1.th_sku)::text)"
> "  Filter: (vc1.th_sku IS NULL)"
> "  ->  Merge Left Join  (cost=19362.72..20201.46 rows=73676 width=310)
> (actual time=917.947..1784.593 rows=141962 loops=1)"
> "        Merge Cond: ((vc.th_sku)::text = (xc.vc_th_sku)::text)"
> "        Filter: (xc.vc_th_sku IS NULL)"
> "        ->  Sort  (cost=17630.88..17999.26 rows=147352 width=310) (actual
> time=871.130..1114.453 rows=147352 loops=1)"
> "              Sort Key: vc.th_sku"
> "              Sort Method:  quicksort  Memory: 45285kB"
> "              ->  Seq Scan on vendor_catalog vc  (cost=0.00..4981.52
> rows=147352 width=310) (actual time=0.020..254.023 rows=147352 loops=1)"
> "        ->  Sort  (cost=1731.84..1753.37 rows=8610 width=8) (actual
> time=46.783..62.347 rows=9689 loops=1)"
> "              Sort Key: xc.vc_th_sku"
> "              Sort Method:  quicksort  Memory: 734kB"
> "              ->  Seq Scan on xc_products xc  (cost=0.00..1169.10 rows=8610
> width=8) (actual time=0.013..25.490 rows=8610 loops=1)"
> "  ->  Sort  (cost=51638.80..51814.57 rows=70309 width=32) (actual
> time=8272.483..8382.258 rows=66097 loops=1)"
> "        Sort Key: vc1.th_sku"
> "        Sort Method:  quicksort  Memory: 4086kB"

So here we get a hash agg in ~4M memory:

> "        ->  HashAggregate  (cost=44572.25..45275.34 rows=70309 width=8)
> (actual time=7978.928..8080.317 rows=54390 loops=1)"

And the row estimate is similar.

(much deleted)

> on any version from 8.3.8 on, this query has never returned, and explain
> analyze never returns, so I am only posting explain output

We get a Seq Scan with a huge cost, and no hash agg or quick sort. Is
the work_mem the same or similar? I'd crank it up for testing just to
see if it helps. 16Meg is pretty safe on a low traffic machine.

> "Seq Scan on vendor_catalog vc  (cost=140413.05..91527264.28 rows=36838
> width=309)"
> "  Filter: ((NOT (hashed SubPlan 2)) AND (NOT (SubPlan 1)))"
> "  SubPlan 2"
> "    ->  Seq Scan on xc_products  (cost=0.00..1716.99 rows=5132 width=8)"
> "          Filter: (vc_th_sku IS NOT NULL)"
> "  SubPlan 1"
> "    ->  Materialize  (cost=138683.23..139734.64 rows=75541 width=8)"
> "          ->  Group  (cost=134997.43..138311.69 rows=75541 width=8)"
> "                ->  Sort  (cost=134997.43..136654.56 rows=662853 width=8)"
> "                      Sort Key: vc1.th_sku"
> "                      ->  Merge Join  (cost=39600.73..52775.08 rows=662853
> width=8)"
> "                            Merge Cond: ((vc1.short_desc_75)::text =
> (vc2.short_desc_75)::text)"
> "                            Join Filter: ((vc2.th_sku)::text <>
> (vc1.th_sku)::text)"
> "                            ->  Sort  (cost=19800.37..20062.75 rows=104954
> width=27)"
> "                                  Sort Key: vc1.short_desc_75"
> "                                  ->  Seq Scan on vendor_catalog vc1
>  (cost=0.00..8534.52 rows=104954 width=27)"
> "                                        Filter: ((cutoff_date IS NULL) AND
> (th_sku IS NOT NULL))"
> "                            ->  Materialize  (cost=19800.37..21112.29
> rows=104954 width=27)"
> "                                  ->  Sort  (cost=19800.37..20062.75
> rows=104954 width=27)"
> "                                        Sort Key: vc2.short_desc_75"
> "                                        ->  Seq Scan on vendor_catalog vc2
>  (cost=0.00..8534.52 rows=104954 width=27)"
> "                                              Filter: (cutoff_date IS
> NULL)"
>
>
>
>
> I've also tried changing the code to not exists, but that query never comes
> back on any version I have available:
>
> explain --analyze
> select vc.* from traderhank.vendor_catalog vc
> where not exists
> (
> select 1 from traderhank.vendor_catalog vc1
> join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
> and vc2.th_sku != vc1.th_sku
> where vc1.cutoff_date is null and vc2.cutoff_date is null and vc1.th_sku =
> vc.th_sku
> group by vc1.th_sku
> )
> and not exists
> (select 1 from traderhank.xc_products where vc_th_sku is not null and
> vc_th_sku = vc.th_sku)
>
> "Nested Loop Anti Join  (cost=63650.74..93617.53 rows=1 width=309)"
> "  Join Filter: ((xc_products.vc_th_sku)::text = (vc.th_sku)::text)"
> "  ->  Hash Anti Join  (cost=63650.74..91836.39 rows=1 width=309)"
> "        Hash Cond: ((vc.th_sku)::text = (vc1.th_sku)::text)"
> "        ->  Seq Scan on vendor_catalog vc  (cost=0.00..8534.52 rows=147352
> width=309)"
> "        ->  Hash  (cost=52775.08..52775.08 rows=662853 width=8)"
> "              ->  Merge Join  (cost=39600.73..52775.08 rows=662853
> width=8)"
> "                    Merge Cond: ((vc1.short_desc_75)::text =
> (vc2.short_desc_75)::text)"
> "                    Join Filter: ((vc2.th_sku)::text <>
> (vc1.th_sku)::text)"
> "                    ->  Sort  (cost=19800.37..20062.75 rows=104954
> width=27)"
> "                          Sort Key: vc1.short_desc_75"
> "                          ->  Seq Scan on vendor_catalog vc1
>  (cost=0.00..8534.52 rows=104954 width=27)"
> "                                Filter: (cutoff_date IS NULL)"
> "                    ->  Materialize  (cost=19800.37..21112.29 rows=104954
> width=27)"
> "                          ->  Sort  (cost=19800.37..20062.75 rows=104954
> width=27)"
> "                                Sort Key: vc2.short_desc_75"
> "                                ->  Seq Scan on vendor_catalog vc2
>  (cost=0.00..8534.52 rows=104954 width=27)"
> "                                      Filter: (cutoff_date IS NULL)"
> "  ->  Seq Scan on xc_products  (cost=0.00..1716.99 rows=5132 width=8)"
> "        Filter: (xc_products.vc_th_sku IS NOT NULL)"
>
>
>
> So, my question is, do I need  to re-write all of my in() and not in ()
> queries to left joins, is this something that might get resolved in another
> release in the future?
>
> Thanks for any help.
>
> Roger Ging
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
When fascism comes to America, it will be intolerance sold as diversity.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Nikolas Everett 2010-04-20 19:57:18 Re: SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
Previous Message Dave Crooke 2010-04-20 19:29:28 Re: SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set