Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Nathan Boley" <npboley(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Date: 2008-06-08 23:03:13
Message-ID: 18634.1212966193@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Nathan Boley" <npboley(at)gmail(dot)com> writes:
> ... There are two potential problems that I see with this approach:

> 1) It assumes the = is equivalent to <= and >= . This is certainly
> true for real numbers, but is it true for every equality relation that
> eqsel predicts for?

The cases that compute_scalar_stats is used in have that property, since
the < and = operators are taken from the same btree opclass.

> Do people think that the improved estimates would be worth the
> additional overhead?

Your argument seems to consider only columns having a normal
distribution. How badly does it fall apart for non-normal
distributions? (For instance, Zipfian distributions seem to be pretty
common in database work, from what I've seen.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-06-08 23:03:48 handling TOAST tables in autovacuum
Previous Message Alvaro Herrera 2008-06-08 22:46:49 Re: GIN improvements