Re: Index-only-scans, indexam API changes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Index-only-scans, indexam API changes
Date: 2009-07-21 17:10:57
Message-ID: 603c8f070907211010v266bcf2w744bfd533272a7a0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 13, 2009 at 11:32 AM, Heikki
Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Tom Lane wrote:
>> One thought here is that an AM call isn't really free, and doing two of
>> them instead of one mightn't be such a good idea.  I would suggest
>> either having a separate AM entry point to get both bits of data
>> ("amgettupledata"?) or adding an optional parameter to amgettuple.
>
> I'm thinking of adding a new flag to IndexScanDesc, "needIndexTuple".
> When that is set, amgettuple stores a pointer to the IndexTuple in
> another new field in IndexScanDesc, xs_itup. (In case of b-tree at
> least, the IndexTuple is a palloc'd copy of the tuple on disk)
>
>> [ thinks a bit ... ]  At least for GIST, it is possible that whether
>> data can be regurgitated will vary depending on the selected opclass.
>> Some opclasses use the STORAGE modifier and some don't.  I am not sure
>> how hard we want to work to support flexibility there.  Would it be
>> sufficient to hard-code the check as "pgam says the AM can do it,
>> and the opclass does not have a STORAGE property"?  Or do we need
>> additional intelligence about GIST opclasses?
>
> Well, the way I have it implemented is that the am is not required to
> return the index tuple, even if requested. I implemented the B-tree
> changes similar to how we implement "kill_prior_tuple": in btgettuple,
> lock the index page and see if the tuple is still at the same position
> that we remembered in the scan opaque struct. If not (because of
> concurrent changes to the page), we give up and don't return the index
> tuple. The executor will then perform a heap fetch as before.
>
> Alternatively, we could copy all the matching index tuples to private
> memory when we step to a new index page, but that seems pretty expensive
> if we're only going to use the first few matching tuples (LIMIT), and I
> fear the memory management gets complicated. But in any case, the GiST
> issue would still be there.
>
> Since we're discussing it, I'm attaching the prototype patch I have for
> returning tuples from b-tree and using them to filter rows before heap
> fetches. I was going to post it in the morning along with description
> about the planner and executor changes, but here goes. It applies on top
> of the indexam API patch I started this thread with.

I'm not sure where we are on this patch for reviewing purposes.
Heikki, are you planning to provide an updated patch? Or what should
we be doing here from an RRR standpoint?

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-07-21 17:20:17 Re: Non-blocking communication between a frontend and a backend (pqcomm)
Previous Message Tom Lane 2009-07-21 16:58:11 Re: full join qualifications on 8.3.1 vs. 8.3.6