Re: Logarithmic data frequency distributions and the query planner

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jerry Gamache <jerry(dot)gamache(at)idilia(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Logarithmic data frequency distributions and the query planner
Date: 2010-07-07 21:22:00
Message-ID: 16024.1278537720@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jerry Gamache <jerry(dot)gamache(at)idilia(dot)com> writes:
> On 8.1, I have a very interesting database where the distributions of
> some values in a multi-million rows table is logarithmic (i.e. the most
> frequent value is an order of magnitude more frequent than the next
> ones). If I analyze the table, the statistics become extremely skewed
> towards the most frequent values and this prevents the planner from
> giving any good results on queries that do not target these entries.

Highly skewed distributions are hardly unusual, and I'm not aware that
the planner is totally incapable of dealing with them. You do need a
large enough stats target to get down into the tail of the
distribution (the default target for 8.1 is probably too small for you).
It might be that there have been some other relevant improvements since
8.1, too ...

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ryan Wexler 2010-07-07 23:06:12 performance on new linux box
Previous Message Jerry Gamache 2010-07-07 20:54:48 Logarithmic data frequency distributions and the query planner