Re: [HACKERS] Index/Function organized table layout (from Re:

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Christopher Browne <cbbrowne(at)libertyrms(dot)info>
Cc: pgsql-performance(at)postgresql(dot)org, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Index/Function organized table layout (from Re:
Date: 2003-10-04 09:00:04
Message-ID: 1065258004.2746.30.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Christopher Browne kirjutas R, 03.10.2003 kell 00:57:
> jllachan(at)nsd(dot)ca (Jean-Luc Lachance) writes:
> > That's one of the draw back of MVCC.
> > I once suggested that the transaction number and other house keeping
> > info be included in the index, but was told to forget it...
> > It would solve once and for all the issue of seq_scan vs index_scan.
> > It would simplify the aggregate problem.
>
> It would only simplify _one_ case, namely the case where someone cares
> about the cardinality of a relation, and it would do that at
> _considerable_ cost.
>
> A while back I outlined how this would have to be done, and for it to
> be done efficiently, it would be anything BUT simple.

Could this be made a TODO item, perhaps with your attack plan.
Of course as strictly optional feature useful only for special situations
(see below)

I cross-post this to [HACKERS] as it seem relevant to a problem recently
discussed there.

> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"

Not really. Just yesterday there was a discussion on [HACKERS] about
implementing btree-organized tables, which would be much less needed if
the visibility info were kept in indexes.

> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.

If the WHERE clause could use the same index (or any index with
visibility info) then there would be no need for "walking through the
tuples" in data relation.

the typical usecase cited on [HACKERS] was time series data, where
inserts are roughly in (timestamp,id)order but queries in (id,timestamp)
order. Now if the index would include all relevant fields
(id,timestamp,data1,data2,...,dataN) then the query could run on index
only touching just a few pages and thus vastly improving performance. I
agree that this is not something everybody needs, but when it is needed
it is needed bad.

> And it still begs the same question, of why the result of this query
> would be particularly meaningful to anyone. I don't see the
> usefulness; I don't see the value of going to the considerable effort
> of "fixing" this purported problem.

Being able to do fast count(*) is just a side benefit.

----------------
Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2003-10-04 09:35:18 Re: Quick question
Previous Message Hannu Krosing 2003-10-04 08:20:09 Re: Using backslash in query

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Kings-Lynne 2003-10-04 09:37:32 Re: count(*) slow on large tables
Previous Message Tom Lane 2003-10-04 03:58:14 Re: Tuning/performance issue...