Re: estimating # of distinct values

From: Jim Nasby <jim(at)nasby(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: estimating # of distinct values
Date: 2011-01-18 17:32:54
Message-ID: 31BC2A43-8CCE-4358-B188-0F930CC0E2E5@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jan 18, 2011, at 11:24 AM, Robert Haas wrote:
> On Tue, Jan 18, 2011 at 12:23 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> On Jan 17, 2011, at 8:11 PM, Robert Haas wrote:
>>> On Mon, Jan 17, 2011 at 7:56 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>>>> - Forks are very possibly a more efficient way to deal with TOAST than having separate tables. There's a fair amount of overhead we pay for the current setup.
>>>
>>> That seems like an interesting idea, but I actually don't see why it
>>> would be any more efficient, and it seems like you'd end up
>>> reinventing things like vacuum and free space map management.
>>
>> The FSM would take some effort, but I don't think vacuum would be that hard to deal with; you'd just have to free up the space in any referenced toast forks at the same time that you vacuumed the heap.
>
> How's that different from what vacuum does on a TOAST table now?

TOAST vacuum is currently an entirely separate vacuum. It might run at the same time as the main table vacuum, but it still has all the work that would be associated with vacuuming a table with the definition of a toast table. In fact, at one point vacuuming toast took two passes: the first deleted the toast rows that were no longer needed, then you had to go back and vacuum those deleted rows.

>>>> - Dynamic forks would make it possible to do a column-store database, or at least something approximating one.
>>>
>>> I've been wondering whether we could do something like this by
>>> treating a table t with columns pk, a1, a2, a3, b1, b2, b3 as two
>>> tables t1 and t2, one with columns pk, a1, a2, a3 and the other with
>>> columns pk, b1, b2, b3. SELECT * FROM t would be translated into
>>> SELECT * FROM t1, t2 WHERE t1.pk = t2.pk.
>>
>> Possibly, but you'd be paying tuple overhead twice, which is what I was looking to avoid with forks.
>
> What exactly do you mean by "tuple overhead"?

typedef struct HeapTupleHeaderData. With only two tables it might not be that bad, depending on the fields. Beyond two tables it's almost certainly a loser.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2011-01-18 17:35:05 Re: estimating # of distinct values
Previous Message Tom Lane 2011-01-18 17:32:11 Re: estimating # of distinct values