Re: estimating # of distinct values

From: tv(at)fuzzy(dot)cz
To: "Jim Nasby" <jim(at)nasby(dot)net>
Cc: "Tomas Vondra" <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: estimating # of distinct values
Date: 2011-01-18 09:53:54
Message-ID: f597bf00a7301a6b2e251caed26fc3d4.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jan 17, 2011, at 6:36 PM, Tomas Vondra wrote:
>> 1) Forks are 'per relation' but the distinct estimators are 'per
>> column' (or 'per group of columns') so I'm not sure whether the file
>> should contain all the estimators for the table, or if there should
>> be one fork for each estimator. The former is a bit difficult to
>> manage, the latter somehow breaks the current fork naming convention.
>
> Yeah, when I looked at the fork stuff I was disappointed to find out
> there's essentially no support for dynamically adding forks. There's two
> other possible uses for that I can think of:
>
> - Forks are very possibly a more efficient way to deal with TOAST than
> having separate tables. There's a fair amount of overhead we pay for the
> current setup.
> - Dynamic forks would make it possible to do a column-store database, or
> at least something approximating one.
>
> Without some research, there's no way to know if either of the above makes
> sense; but without dynamic forks we're pretty much dead in the water.
>
> So I wonder what it would take to support dynamically adding forks...

Interesting ideas, but a bit out of scope. I think I'll go with one fork
containing all the estimators for now, although it might be inconvenient
in some cases. I was thinking about rebuilding a single estimator with
increased precision - in that case the size changes so that all the other
data has to be shifted. But this won't be very common (usually all the
estimators will be rebuilt at the same time), and it's actually doable.

So the most important question is how to intercept the new/updated rows,
and where to store them. I think each backend should maintain it's own
private list of new records and forward them only in case of commit. Does
that sound reasonable?

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-01-18 09:56:49 Re: Replication logging
Previous Message Fujii Masao 2011-01-18 09:49:54 Re: pg_basebackup for streaming base backups