Re: WIP: SP-GiST, Space-Partitioned GiST

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: SP-GiST, Space-Partitioned GiST
Date: 2011-10-02 23:21:48
Message-ID: 3231.1317597708@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 06.09.2011 20:34, Oleg Bartunov wrote:
>> Here is the latest spgist patch, which has all planned features as well as
>> all overhead, introduced by concurrency and recovery, so performance
>> measurement should be realistic now.

> I'm ignoring the text suffix-tree part of this for now, because of the
> issue with non-C locales that Alexander pointer out.

It seems to me that SP-GiST simply cannot work for full text comparisons
in non-C locales, because it's critically dependent on the assumption
that comparisons of strings are consistent with comparisons of prefixes
of those strings ... an assumption that's just plain false for most
non-C locales.

We can dodge that problem in the same way that we did in the btree
pattern_ops opclasses, namely implement the opclass only for the =
operator and the special operators ~<~ etc. I think I favor doing this
for the first round, because it's a simple matter of removing code
that's currently present in the patch. Even with only = support
the opclass would be extremely useful.

Something we could consider later is a way to use the index for the
regular text comparison operators (< etc), but only when the operator
is using C collation. This is not so much a matter for the index
implementation as it is about teaching the planner to optionally
consider collation when matching an operator call to the index. It's
probably going to tie into the unfinished business of marking which
operators are collation sensitive and which are not.

In other news, I looked at the patch briefly, but I don't think I want
to review it fully without some documentation. The absolute minimum
requirement IMO is documentation comparable to what we have for GIN,
ie a specification for the support methods and some indication of when
you'd use this index type in preference to others. I'd be willing to
help copy-edit and SGML-ize such documentation, but I do not care to
reverse-engineer it from the code.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-10-03 01:50:46 Re: Bug with pg_ctl -w/wait and config-only directories
Previous Message Dickson S. Guedes 2011-10-02 22:45:14 Re: Separating bgwriter and checkpointer