Quick Links

Re: Designing an extension for feature-space similarity search

From:	Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Designing an extension for feature-space similarity search
Date:	2012-02-17 19:00:29
Message-ID:	4F3EA3CD.1070103@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane wrote:
> Jay Levitt<jay(dot)levitt(at)gmail(dot)com> writes:
>> - Does KNN-GiST run into problems when<-> returns values that don't "make
>> sense" in the physical world?
>
> If the indexed entities are records, it would be
> entirely your own business how you handled individual fields being NULL.

This turns out to be a bit challenging. Let's say I'm building a
nullable_point type that allows the Y axis to be NULL (or any sentinel value
for "missing data"), where the semantics are "NULL is infinitely far from
the query". I'll need my GiST functions to return useful results with NULL
- not just correct results, but results that help partition the tree nicely.

At first I thought this posed a challenge for union; if I have these points:

(1,2)
(2,1)
(1,NULL)

what's the union? I think the answer is to treat NULL box coordinates like
LL = -infinity, UR = infinity, or (equivalently, I think) to store a
saw_nulls bit in addition to LL and UR.

The real challenge is probably in picksplit and penalty - where in the tree
should I stick (1,NULL)? - at which point you say "Yes, algorithms for
efficient indexes are hard work and computer-science-y" and point me at
surrogate splitters.

Just thinking out loud, I guess; if other GiST types have addressed this
problem, I'd love to hear about it.

Jay

In response to

Re: Designing an extension for feature-space similarity search at 2012-02-15 22:29:41 from Tom Lane

Responses

Re: Designing an extension for feature-space similarity search at 2012-02-17 19:13:33 from Alexander Korotkov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2012-02-17 19:09:56	Re: Simulating Clog Contention
Previous Message	Jeff MacDonald	2012-02-17 18:31:53	Re: MySQL search query is not executing in Postgres DB