Re: benchmarking the query planner

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: benchmarking the query planner
Date: 2009-04-03 02:26:46
Message-ID: 603c8f070904021926g92eb55sdfc68141133957c1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 19, 2009 at 4:04 AM, ITAGAKI Takahiro
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> >> Works for me. Especially if you want to think more about ANALYZE before
>> >> changing that.
>> >
>> > Well, it's something that would be sane to contemplate adding in 8.4.
>> > It's way too late for any of this other stuff to happen in this release.
>>
>> I'm thinking about trying to implement this, unless someone else is
>> already planning to do it.  I'm not sure it's practical to think about
>> getting this into 8.4 at this point, but it's worth doing whether it
>> does or not.
>
> Can we use get_relation_stats_hook on 8.4? The pg_statistic catalog
> will be still modified by ANALYZEs, but we can rewrite the statistics
> just before it is used.
>
> your_relation_stats_hook(root, rte, attnum, vardata)
> {
>    Call default implementation;
>    if (rte->relid = YourRelation && attnum = YourColumn)
>        ((Form_pg_statistic) (vardata->statsTuple))->stadistinct = YourNDistinct;
> }

I don't know, can you run a query from inside the stats hook? It
sounds like this could be made to work for a hard-coded relation and
column, but ideally you'd like to get this data out of a table
somewhere.

I started implementing this by adding attdistinct to pg_attribute and
making it a float8, with 0 meaning "don't override the results of the
normal stats computation" and any other value meaning "override the
results of the normal stats computation with this value". I'm not
sure, however, whether I can count on the result of an equality test
against a floating-point zero to be reliable on every platform. It
also seems like something of a waste of space, since the only positive
values that are useful are integers (and presumably less than 2^31-1)
and the only negative values that are useful are > -1. So I'm
thinking about making it an integer, to be interpreted as follows:

0 => compute ndistinct normally
positive value => use this value for ndistinct
negative value => use this value * 10^-6 for ndistinct

Any thoughts?

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Scott Marlowe 2009-04-03 02:53:37 Re: How would I get rid of trailing blank line?
Previous Message Robert Haas 2009-04-03 02:08:40 a few crazy ideas about hash joins