Re: Table partition for very large table

From: Yudie Pg <yudiepg(at)gmail(dot)com>
To: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Table partition for very large table
Date: 2005-03-29 00:07:26
Message-ID: e460d0c05032816073a4100b7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> > create index prdt_new_url_dx on prdt_new (url)
> > create index prdt_new_sku_dx on prdt_new (sku)
> > create index prdt_old_sku_dx on prdt_old (sku)
> > create index prdt_new_url_null_dx on prdt_new (url) where prdt_new.url
> > IS NULL

I added indexes & redo the analyze - Query plan looks better,
But when I execute the query it still can't finish all at once. (i've
been waiting more than 30 minutes) and seems it time out (Error
"canceling query due to user request") :-B
Maybe you can help to analyze this query plan (the second one) to see
what make it slow?

QUERY PLAN (BEFORE ANALYZE):
Merge Left Join (cost=1886617.54..1960855.12 rows=4979571 width=19)
Merge Cond: (("outer"."?column3?" = "inner"."?column4?") AND
("outer".groupnum = "inner".groupnum))
Filter: (("inner".url IS NULL) OR (("inner".url)::text = ''::text))
-> Sort (cost=969258.98..981707.91 rows=4979571 width=19)
Sort Key: (mc.sku)::text, mc.groupnum
-> Seq Scan on prdt_old mc (cost=0.00..297611.71
rows=4979571 width=19)
-> Sort (cost=917358.56..928785.51 rows=4570779 width=82)
Sort Key: (mi.sku)::text, mi.groupnum
-> Seq Scan on prdt_new mi (cost=0.00..126438.79
rows=4570779 width=82)

QUERY PLAN AFTER ANALYZE:
Nested Loop Left Join (cost=0.00..233277.42 rows=23747 width=17)
Join Filter: ("inner".groupnum = "outer".groupnum)
Filter: (("inner".url IS NULL) OR (("inner".url)::text = ''::text))
-> Index Scan using idx_prdtold_sku on prdt_old mc
(cost=0.00..112211.85 rows=23747 width=17)
-> Index Scan using idx_prdtnew_sku on prdt_new mi
(cost=0.00..5.08 rows=1 width=82)
Index Cond: ((mi.sku)::text = ("outer".sku)::text)

I work around with indexes before and I did not sure to create index
for the url field because it is a text field and not the main key for
this query. But I just know that we can create index with condition
(in this case where url is null), I guess it may cut some of the query
time.

I seperate the query by groupnum instead querying the whole table that
cause the bottle neck effect. It works but yes it's slow (less than 1
hour) but goes thru.
I wish could do this simultaniously.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Villalón 2005-03-29 01:06:39 Error en la instalacion pgsql 8 en xp
Previous Message Scott Marlowe 2005-03-28 22:27:29 Re: Table partition for very large table