Re: Designing an extension for feature-space similarity search

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Designing an extension for feature-space similarity search
Date: 2012-02-17 19:13:33
Message-ID: CAPpHfdvjNKow_0Ump2Xvt1APTf4ieqHn0X9FsMrPaUey4Qoqww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 17, 2012 at 11:00 PM, Jay Levitt <jay(dot)levitt(at)gmail(dot)com> wrote:

> Tom Lane wrote:
>
>> Jay Levitt<jay(dot)levitt(at)gmail(dot)com> writes:
>>
>>> - Does KNN-GiST run into problems when<-> returns values that don't
>>> "make
>>>
>>> sense" in the physical world?
>>>
>>
>> If the indexed entities are records, it would be
>> entirely your own business how you handled individual fields being NULL.
>>
>
> This turns out to be a bit challenging. Let's say I'm building a
> nullable_point type that allows the Y axis to be NULL (or any sentinel
> value for "missing data"), where the semantics are "NULL is infinitely far
> from the query". I'll need my GiST functions to return useful results
> with NULL - not just correct results, but results that help partition the
> tree nicely.
>
> At first I thought this posed a challenge for union; if I have these
> points:
>
> (1,2)
> (2,1)
> (1,NULL)
>
> what's the union? I think the answer is to treat NULL box coordinates like
> LL = -infinity, UR = infinity, or (equivalently, I think) to store a
> saw_nulls bit in addition to LL and UR.
>
> The real challenge is probably in picksplit and penalty - where in the
> tree should I stick (1,NULL)? - at which point you say "Yes, algorithms for
> efficient indexes are hard work and computer-science-y" and point me at
> surrogate splitters.
>
> Just thinking out loud, I guess; if other GiST types have addressed this
> problem, I'd love to hear about it.

Similar problem appears at GiST indexing of ranges, because range can be
empty. There additional "contain empty" flag was introduced. This "contain
empty" flag indicates that underlying value can be empty. So, this flag is
set when union with empty range or other range with this flag set. It's
likely you need similar flag for each dimension.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Browne 2012-02-17 19:28:21 Re: MySQL search query is not executing in Postgres DB
Previous Message Robert Haas 2012-02-17 19:09:56 Re: Simulating Clog Contention