Quick Links

Shards + hash = forever running queries

From:	Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Shards + hash = forever running queries
Date:	2012-07-23 10:03:11
Message-ID:	CA+mi_8ZWi8MUvpuiqse+wYMreBF_j0vLDK35=H-ALwfVm3fPmA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hello,

We are using Postgres 9.1.4. We are struggling with a class of queries
that got impossible to run after sharding a large table. Everything
like:

select small.something, big.anything
from small join big on small.big_id = big.id;

and variation such as "select * from big where id in (select big_id from small)"

Since "big" was sharded, the query plan results in something like:

Hash Join (cost=10000000001.23..30038997974.72 rows=10 width=753)
Hash Cond: (b.id = i.big_id)
-> Append (cost=0.00..20038552251.23 rows=118859245 width=11)
-> Index Scan using big_201207_pkey on big_201207 b
(cost=0.00..2224100.46 rows=1609634 width=12)
-> Index Scan using big_201101_pkey on big_201101 b
(cost=0.00..404899.71 rows=5437497 width=12)
-> Index Scan using big_201104_pkey on big_201104 b
(cost=0.00..349657.58 rows=4625181 width=12)
-> [...all the shards]
-> Hash (cost=10000000001.10..10000000001.10 rows=10 width=742)
-> Seq Scan on small i (cost=10000000000.00..10000000001.10
rows=10 width=742)

Postgres ends up in never-ending reads: even if "small" has only three
rows I've never seen such query finishing, the time passed being even
longer than a full scan on big.

The plan looks sub-optimal, as it seems it first does a huge indexscan
of all the partitions, then it joins the result against a small hash.

1. Can we fix the queries to work around this problem?

2. Could the planner be fixed for this scenario for PG 9.2 (or 9.3)?
Creating the hash beforehand, performing an hash join for each
partition and merging the results looks like it would bring it back
into the realm of the runnable queries. Am I wrong?

Thank you very much.

-- Daniele

Responses

Re: Shards + hash = forever running queries at 2012-07-23 11:43:38 from Daniele Varrazzo
Re: Shards + hash = forever running queries at 2012-07-23 15:07:45 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Daniele Varrazzo	2012-07-23 11:43:38	Re: Shards + hash = forever running queries
Previous Message	Jim Vanns	2012-07-23 08:41:45	Odd blocking (or massively latent) issue - even with EXPLAIN