Quick Links

Re: Shards + hash = forever running queries

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Shards + hash = forever running queries
Date:	2012-07-23 15:07:45
Message-ID:	29911.1343056065@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com> writes:
> Since "big" was sharded, the query plan results in something like:

> Hash Join (cost=10000000001.23..30038997974.72 rows=10 width=753)
> Hash Cond: (b.id = i.big_id)
> -> Append (cost=0.00..20038552251.23 rows=118859245 width=11)
> -> Index Scan using big_201207_pkey on big_201207 b
> (cost=0.00..2224100.46 rows=1609634 width=12)
> -> Index Scan using big_201101_pkey on big_201101 b
> (cost=0.00..404899.71 rows=5437497 width=12)
> -> Index Scan using big_201104_pkey on big_201104 b
> (cost=0.00..349657.58 rows=4625181 width=12)
> -> [...all the shards]
> -> Hash (cost=10000000001.10..10000000001.10 rows=10 width=742)
> -> Seq Scan on small i (cost=10000000000.00..10000000001.10
> rows=10 width=742)

[ squint... ] 9.1 certainly ought to be able to find a smarter plan for
such a case. For instance, if I try this on 9.1 branch tip:

regression=# create table p (id int primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "p_pkey" for table "p"
CREATE TABLE
regression=# create table c1 (primary key (id)) inherits(p);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "c1_pkey" for table "c1"
CREATE TABLE
regression=# create table c2 (primary key (id)) inherits(p);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "c2_pkey" for table "c2"
CREATE TABLE
regression=# explain select * from p,int4_tbl where id=f1;
QUERY PLAN
--------------------------------------------------------------------------------
Nested Loop (cost=0.00..53.25 rows=120 width=8)
Join Filter: (public.p.id = int4_tbl.f1)
-> Seq Scan on int4_tbl (cost=0.00..1.05 rows=5 width=4)
-> Append (cost=0.00..10.40 rows=3 width=4)
-> Index Scan using p_pkey on p (cost=0.00..1.87 rows=1 width=4)
Index Cond: (id = int4_tbl.f1)
-> Index Scan using c1_pkey on c1 p (cost=0.00..4.27 rows=1 width=4)
Index Cond: (id = int4_tbl.f1)
-> Index Scan using c2_pkey on c2 p (cost=0.00..4.27 rows=1 width=4)
Index Cond: (id = int4_tbl.f1)
(10 rows)

You have evidently got enable_seqscan turned off, so I wonder whether
the cost penalties applied by that are swamping the estimates. Do you
get any better results if you re-enable that?

regards, tom lane

In response to

Shards + hash = forever running queries at 2012-07-23 10:03:11 from Daniele Varrazzo

Responses

Re: Shards + hash = forever running queries at 2012-07-24 09:06:15 from Daniele Varrazzo

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2012-07-23 15:09:16	Re: Odd blocking (or massively latent) issue - even with EXPLAIN
Previous Message	Jeff Janes	2012-07-23 15:02:51	Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)