Re: estimating # of distinct values

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: estimating # of distinct values
Date: 2011-01-20 02:36:30
Message-ID: AANLkTimLhW4nS-DrAM-=CvUnhqKqf=eXJFXt4j4kWH2a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 19, 2011 at 5:13 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>>> Regarding the crash scenario - if the commit fails, just throw away the
>>> local estimator copy, it's not needed. I'm not sure how to take care of
>>> the case when commit succeeds and the write of the merged estimator
>>> fails, but I think it might be possible to write the estimator to xlog
>>> or something like that. So it would be replayed during recovery etc. Or
>>> is it a stupid idea?
>>
>> It's not stupid, in the sense that that is what you'd need to do if
>> you want to avoid ever having to rescan the table.  But it is another
>> thing that I think is going to be way too expensive.
>
> Way too expensive? All you need to put into the logfile is a copy of the
> estimator, which is a few kBs. How is that 'way too expensive'?

At this point, this is all a matter of religion, right? Neither of us
has a working implementation we've benchmarked. But yes, I believe
you're going to find that implementing some kind of streaming
estimator is going to impose a... <pulls number out of rear end> 6%
performance penalty, even after you've optimized the living daylights
out of it. Now you might say... big deal, it improves my problem
queries by 100x. OK, but if you could get the same benefit by doing
an occasional full table scan during off hours, you could have the
same performance with a *0%* performance penalty. Even better, the
code changes would be confined to ANALYZE rather than spread out all
over the system, which has positive implications for robustness and
likelihood of commit.

I'm not trying to argue you out of working on this. It's obviously
your time to spend, and if works better than I think it will, great!
I'm merely offering you an opinion on what will probably happen if you
go this route - namely, it'll carry an unpalatable run-time penalty.
That opinion may be worth no more than what you paid for it, but there
you have it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-01-20 02:37:56 Re: estimating # of distinct values
Previous Message Florian Pflug 2011-01-20 02:35:27 Re: estimating # of distinct values