Quick Links

Re: performance change from 8.3.1 to later releases

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Roger Ging <rging(at)musicreports(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: performance change from 8.3.1 to later releases
Date:	2010-04-20 19:33:32
Message-ID:	x2gdcc563d11004201233yea1a2e46i9c97b3a9f6269dea@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Tue, Apr 20, 2010 at 12:38 PM, Roger Ging <rging(at)musicreports(dot)com> wrote:
> Hi,
>
> I have access to servers running 8.3.1, 8.3.8, 8.4.2 and 8.4.3. I have
> noticed that on the 8.4.* versions, a lot of our code is either taking much
> longer to complete, or never completing. I think I have isolated the
> problem to queries using in(), not in() or not exists(). I've put together
> a test case with one particular query that demonstrates the problem.
>
> select count(*) from traderhank.vendor_catalog = 147,352
>
> select count(*) from traderhank.xc_products = 8,610
>
> The sub query (select vc1.th_sku from traderhank.vendor_catalog vc1
> join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
> and vc2.th_sku != vc1.th_sku
> where vc1.cutoff_date is null and vc2.cutoff_date is null
> group by vc1.th_sku
> ) yields 54,390 rows
>
> The sub query (select vc_th_Sku from traderhank.xc_products where vc_th_sku
> is not null) yields 5,132 rows
>
> These 2 tables have been loaded from a pg_dump on all servers, vacuum
> analyze run after load.
>
> 1st case: pg 8.3.1 using left join finishes the query in about 3.5 seconds
>
> explain analyze
> select vc.* from traderhank.vendor_catalog vc
> left join
> (
> select vc1.th_sku from traderhank.vendor_catalog vc1
> join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
> and vc2.th_sku != vc1.th_sku
> where vc1.cutoff_date is null and vc2.cutoff_date is null
> group by vc1.th_sku
> ) vcj on vcj.th_sku = vc.th_sku
> left join traderhank.xc_products xc on xc.vc_th_sku = vc.th_sku
> where vcj.th_sku is null
> and xc.vc_th_sku is null
>
> "Merge Left Join (cost=71001.53..72899.35 rows=36838 width=310) (actual
> time=9190.446..10703.509 rows=78426 loops=1)"
> " Merge Cond: ((vc.th_sku)::text = (vc1.th_sku)::text)"
> " Filter: (vc1.th_sku IS NULL)"
> " -> Merge Left Join (cost=19362.72..20201.46 rows=73676 width=310)
> (actual time=917.947..1784.593 rows=141962 loops=1)"
> " Merge Cond: ((vc.th_sku)::text = (xc.vc_th_sku)::text)"
> " Filter: (xc.vc_th_sku IS NULL)"
> " -> Sort (cost=17630.88..17999.26 rows=147352 width=310) (actual
> time=871.130..1114.453 rows=147352 loops=1)"
> " Sort Key: vc.th_sku"
> " Sort Method: quicksort Memory: 45285kB"
> " -> Seq Scan on vendor_catalog vc (cost=0.00..4981.52
> rows=147352 width=310) (actual time=0.020..254.023 rows=147352 loops=1)"
> " -> Sort (cost=1731.84..1753.37 rows=8610 width=8) (actual
> time=46.783..62.347 rows=9689 loops=1)"
> " Sort Key: xc.vc_th_sku"
> " Sort Method: quicksort Memory: 734kB"
> " -> Seq Scan on xc_products xc (cost=0.00..1169.10 rows=8610
> width=8) (actual time=0.013..25.490 rows=8610 loops=1)"
> " -> Sort (cost=51638.80..51814.57 rows=70309 width=32) (actual
> time=8272.483..8382.258 rows=66097 loops=1)"
> " Sort Key: vc1.th_sku"
> " Sort Method: quicksort Memory: 4086kB"

So here we get a hash agg in ~4M memory:

> " -> HashAggregate (cost=44572.25..45275.34 rows=70309 width=8)
> (actual time=7978.928..8080.317 rows=54390 loops=1)"

And the row estimate is similar.

(much deleted)

> on any version from 8.3.8 on, this query has never returned, and explain
> analyze never returns, so I am only posting explain output

We get a Seq Scan with a huge cost, and no hash agg or quick sort. Is
the work_mem the same or similar? I'd crank it up for testing just to
see if it helps. 16Meg is pretty safe on a low traffic machine.

> "Seq Scan on vendor_catalog vc (cost=140413.05..91527264.28 rows=36838
> width=309)"
> " Filter: ((NOT (hashed SubPlan 2)) AND (NOT (SubPlan 1)))"
> " SubPlan 2"
> " -> Seq Scan on xc_products (cost=0.00..1716.99 rows=5132 width=8)"
> " Filter: (vc_th_sku IS NOT NULL)"
> " SubPlan 1"
> " -> Materialize (cost=138683.23..139734.64 rows=75541 width=8)"
> " -> Group (cost=134997.43..138311.69 rows=75541 width=8)"
> " -> Sort (cost=134997.43..136654.56 rows=662853 width=8)"
> " Sort Key: vc1.th_sku"
> " -> Merge Join (cost=39600.73..52775.08 rows=662853
> width=8)"
> " Merge Cond: ((vc1.short_desc_75)::text =
> (vc2.short_desc_75)::text)"
> " Join Filter: ((vc2.th_sku)::text <>
> (vc1.th_sku)::text)"
> " -> Sort (cost=19800.37..20062.75 rows=104954
> width=27)"
> " Sort Key: vc1.short_desc_75"
> " -> Seq Scan on vendor_catalog vc1
> (cost=0.00..8534.52 rows=104954 width=27)"
> " Filter: ((cutoff_date IS NULL) AND
> (th_sku IS NOT NULL))"
> " -> Materialize (cost=19800.37..21112.29
> rows=104954 width=27)"
> " -> Sort (cost=19800.37..20062.75
> rows=104954 width=27)"
> " Sort Key: vc2.short_desc_75"
> " -> Seq Scan on vendor_catalog vc2
> (cost=0.00..8534.52 rows=104954 width=27)"
> " Filter: (cutoff_date IS
> NULL)"
>
>
>
>
> I've also tried changing the code to not exists, but that query never comes
> back on any version I have available:
>
> explain --analyze
> select vc.* from traderhank.vendor_catalog vc
> where not exists
> (
> select 1 from traderhank.vendor_catalog vc1
> join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75
> and vc2.th_sku != vc1.th_sku
> where vc1.cutoff_date is null and vc2.cutoff_date is null and vc1.th_sku =
> vc.th_sku
> group by vc1.th_sku
> )
> and not exists
> (select 1 from traderhank.xc_products where vc_th_sku is not null and
> vc_th_sku = vc.th_sku)
>
> "Nested Loop Anti Join (cost=63650.74..93617.53 rows=1 width=309)"
> " Join Filter: ((xc_products.vc_th_sku)::text = (vc.th_sku)::text)"
> " -> Hash Anti Join (cost=63650.74..91836.39 rows=1 width=309)"
> " Hash Cond: ((vc.th_sku)::text = (vc1.th_sku)::text)"
> " -> Seq Scan on vendor_catalog vc (cost=0.00..8534.52 rows=147352
> width=309)"
> " -> Hash (cost=52775.08..52775.08 rows=662853 width=8)"
> " -> Merge Join (cost=39600.73..52775.08 rows=662853
> width=8)"
> " Merge Cond: ((vc1.short_desc_75)::text =
> (vc2.short_desc_75)::text)"
> " Join Filter: ((vc2.th_sku)::text <>
> (vc1.th_sku)::text)"
> " -> Sort (cost=19800.37..20062.75 rows=104954
> width=27)"
> " Sort Key: vc1.short_desc_75"
> " -> Seq Scan on vendor_catalog vc1
> (cost=0.00..8534.52 rows=104954 width=27)"
> " Filter: (cutoff_date IS NULL)"
> " -> Materialize (cost=19800.37..21112.29 rows=104954
> width=27)"
> " -> Sort (cost=19800.37..20062.75 rows=104954
> width=27)"
> " Sort Key: vc2.short_desc_75"
> " -> Seq Scan on vendor_catalog vc2
> (cost=0.00..8534.52 rows=104954 width=27)"
> " Filter: (cutoff_date IS NULL)"
> " -> Seq Scan on xc_products (cost=0.00..1716.99 rows=5132 width=8)"
> " Filter: (xc_products.vc_th_sku IS NOT NULL)"
>
>
>
> So, my question is, do I need to re-write all of my in() and not in ()
> queries to left joins, is this something that might get resolved in another
> release in the future?
>
> Thanks for any help.
>
> Roger Ging
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
When fascism comes to America, it will be intolerance sold as diversity.

In response to

performance change from 8.3.1 to later releases at 2010-04-20 18:38:04 from Roger Ging

Responses

Re: performance change from 8.3.1 to later releases at 2010-04-21 03:47:01 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Nikolas Everett	2010-04-20 19:57:18	Re: SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set
Previous Message	Dave Crooke	2010-04-20 19:29:28	Re: SOLVED ... Re: Getting rid of a cursor from JDBC .... Re: [PERFORM] Re: HELP: How to tame the 8.3.x JDBC driver with a biq guery result set