Re: Abysmal hash join

From: Florian Weimer <fweimer(at)bfk(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Abysmal hash join
Date: 2006-09-11 16:28:49
Message-ID: 82sliyweny.fsf@mid.bfk.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

* Tom Lane:

> Yeah, n_distinct estimation from a sample is inherently hard :-(. Given
> that you have such a long tail on the distribution, it might be worth
> your while to crank the stats target for that column all the way to the
> maximum (1000).

I've done that. Fortunately, ANALYZE time didn't increase by that
much, compared to the default (by just a factor of 10). The bitmap
scan estimate is still way off (around 8000), but let's hope that it
won't matter in practice.

> Also you need to experiment with extending the stats for the smaller
> table.

Yeah, the situation is quite similar, but on a much smaller scale.

> I believe what's happening here is that the smaller table joins only to
> less-frequent entries in the big table (correct?).

Almost. We won't select the rows based on these values, at least not
in queries of that type. The reason is simply that the result set is
too large to be useful.

> Of course, large stats targets will slow down planning to some extent,
> so you should also keep an eye on how long it takes to plan the query.

These queries are mostly ad-hoc, so a delay of a couple of seconds
doesn't matter. Only if you need to wait five minutes, it's a
different story.

It seems that the situation is under control now. Thanks.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Durlacher Allee 47 tel: +49-721-96201-1
D-76131 Karlsruhe fax: +49-721-96201-99

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Piñeiro 2006-09-11 18:14:13 Performance problem with Sarge compared with Woody
Previous Message Brian Wipf 2006-09-11 15:50:43 Re: Configuring System for Speed