Re: WIP: SP-GiST, Space-Partitioned GiST

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: SP-GiST, Space-Partitioned GiST
Date: 2011-12-13 16:34:41
Message-ID: 4EE77EA1.6030503@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I wrote:
>> ... the leaf tuple datatype is hard-wired to be
> After another day's worth of hacking, I now understand the reason for
> the above: when an index is less than a page and an incoming value would
> still fit on the root page, the incoming value is simply dumped into a
> leaf tuple without ever calling any opclass-specific function at all.
Exactly.

> To allow the leaf datatype to be different from the indexed column,
> we'd need at least one more opclass support function, and it's not clear
> that the potential gain is worth any extra complexity.
Agree, all opclasses which I could imagine for sp-gist use the same type.
Without clear example I don't like an idea to add one more support function and
it could be easily added later as an optional support function as it's already
done for distance function for GiST

> However, I now have another question: what is the point of the
> allTheSame mechanism? It seems to add quite a great deal of complexity,
I thought about two options: separate code path in core to support
a-lot-of-the-same-values with minimal support in support functions and move all
logic about this case to support functions. Second option is demonstrated in
k-d-tree implementation, where split axis is contained by each half-plane.
May be it is a simpler solution although it moves responsibility to opclass
developers.

> one thing, it's giving me fits while attempting to fix the limitation
> on storing long indexed values. There's no reason why a suffix tree
> representation shouldn't work for long strings, but you have to be
> willing to cap the length of any given inner tuple's prefix to something
I don't see clear interface for now: let we have an empty index and we need to
insert a long string (more than even several page). So, it's needed to have
support function to split input value to several ones. I supposed that sp-gist
is already complex enough for first step to add support for this non very useful
case.

Of course, for future we have a plans to add support of long values, NULLs/IS
NULL, knn-search at least.

> I'm also still wondering what your thoughts are on storing null values
> versus full-index-scan capability. I'm leaning towards getting rid of
> the dead code, but if you have an idea how to remove the limitation,
> maybe we should do that instead.

I didn't have a plan to support NULLs in first stage, because it's not clear for
me how and where to store them. It seems to me that it should be fully separated
from normal path, like a linked list of pages with only ItemPointer data
(similar to leaf data pages in GIN)

I missed that planner will not create qual-free scan, because I thought it's
still possible with NOT NULL columns. If not, this code could be
removed/commented/ifdefed.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Goncharov 2011-12-13 16:43:35 Re: libpq: PQcmdStatus, PQcmdTuples signatures can be painlessly improved
Previous Message Heikki Linnakangas 2011-12-13 16:22:13 Re: [REVIEW] Patch for cursor calling with named parameters