Re: TODO item: Allow data to be pulled directly from indexes

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Karl Schnaitter" <karlsch(at)soe(dot)ucsc(dot)edu>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TODO item: Allow data to be pulled directly from indexes
Date: 2008-06-30 07:33:36
Message-ID: 48688C50.2090702@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Karl Schnaitter wrote:
> The main difference between (1) & (4) is that (1) will sometimes require
> heap lookups and (4) never will. Moreover, the heap lookups in (1) will
> be difficult for the optimizer to estimate, unless some special
> statistics can be maintained for this purpose.

Yeah, we certainly should maintain a statistic for it.

> (2) & (3) can work for any index, and they are quite elegant in the way
> that the overhead does not change with the number of indexes. The TODO
> also notes the benefit of (2) for efficient vacuuming. Thus, I think
> that (2) is a great idea in general, but it does not serve the intended
> purpose of this TODO item. Once a page gets marked as requiring
> visibility checks, it cannot be unmarked until the next VACUUM. The
> whole point of this feature is that we are willing to be more proactive
> during updates in order to make index access more efficient.

In some cases we can mark a page earlier, as soon as we see that the
condition is true. Most importantly, when new tuples are inserted, we
can mark the page as soon as the inserting transaction is visible to all.

Also, the visibility map ought to make vacuums cheaper, as you only need
to scan the parts of the table that have beem modified since last
vacuum. You still need to scan all indexes, though. But assuming that
you somehow solve the correctness issues in the "add visibility fields
to index tuples" approach, we can use the same solution to perform
retail vacuums, which would bring vacuuming and the visibility map
approach on par with that approach anyway.

> Please let me know if you agree/disagree with anything here. I need to get this feature implemented for my research, but I would also love to contribute it to the community so your opinions matter a lot.

Well, I think the visibility map is a much better approach. This has
been discussed many times before, so I don't really have anything new to
add.

I've been working adding support for so-called "relation forks"
(http://archives.postgresql.org/message-id/48651722.8020600@enterprisedb.com),
to allow attaching metadata to relations, like the visibility map. I'm
going to use the facility for a new FSM implementation, which I'm
working on at the moment, but after that's done I'm going to start
working on the visibility map. And after that's done and working for
VACUUM, I'm going to work on using it for index-only-scans.

I'm not sure I have enough time to get all that done for 8.4, it's
looking bad at the moment, so help would be much appreciated. If you
don't agree with taking the visibility map approach, I would suggest
working on the indexam API changes first, to allow returning index
tuples from an index. I believe that part is the same regardless of how
we check the visibility.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Huxton 2008-06-30 08:38:08 Re: Does anything dump per-database config settings? (was Re: ALTER DATABASE vs pg_dump)
Previous Message Simon Riggs 2008-06-30 06:58:39 Re: Join Removal/ Vertical Partitioning