Re: Understanding and implementing a GiST Index

From: Connor Wolf <wolf(at)imaginaryindustries(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Understanding and implementing a GiST Index
Date: 2014-10-10 03:09:07
Message-ID: 54374DD3.40308@imaginaryindustries.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I had skimmed the presentation slides, but I hadn't looked that closely
because it appeared to be using cosine similarity metrics, and only
operated on matrices, neither of which are what I wanted.

On closer examination, I think I could explode my packed hash values to
a matrix. I'm not sure how the cosine similarity metric would work given
that the arrays would only contain the values either 0 or 1 (as my hash
value is fundamentally a integer of configurable length (you can alter
the hash size, I'm using 64 bits)).

I'll check out the code and poke it a bit.

I'll probably also move this discussion to the hackers mailing list (the
instructions say to ask here first, but Oleg suggested I go there, and I
generally agree).

Connor

On 10/9/2014 8:19 AM, Sergey Konoplev wrote:
> On Thu, Oct 9, 2014 at 12:09 AM, Connor Wolf
> <wolf(at)imaginaryindustries(dot)com> wrote:
>> I'm trying to implement a custom indexing system using the GiST index
>> framework, and I have a few questions.
>> Basically, I'm trying to implement a system that allows me to search across
>> a set of 64 bit integers by hamming distance. This is intended to be used in
>> searching for similar images, where the 64 bit integer is actually a phash
>> of an image, and similarity between two images is reflected in the hamming
>> distance between two integers.
> Have you seen the smlar extension?
>
> http://www.pgcon.org/2012/schedule/events/443.en.html
> http://sigaev.ru/git/gitweb.cgi?p=smlar.git;a=blob;hb=HEAD;f=README
> http://railsware.com/blog/2012/05/10/effective-similarity-search-in-postgresql/
>
>> Anyways, The appropriate approach here is to use something called a BK tree,
>> for which I've written some test code and I think I have a decent grip of
>> (my test code seems to work, in any event).
>>
>> That said:
>>
>> Is there a simple piece of example-code for implementing a GiST index for a
>> single index? I've been looking through the /contrib/ directory,
>> particularly the /contrib/btree_gist/ files, but it looks like 90% of the
>> complexity there is related to supporting all the column types Postgres has,
>> rather then actually tied to producing a functional index.
>> Once I have something compiling, how can I check to be sure that I'm
>> actually using the indexing module I created? I can enable my compiled
>> extension using `CREATE EXTENSION`, but is there an option for `CREATE INDEX
>> test_index ON testing USING gist (val);` that lets me specify *which* GiST
>> index is actually employed? Is this even a valid question?
>> This is probably something that's available in one of the system tables
>> somewhere, but I haven't had much luck with google finding out where.
>> Testing: What's the appropriate way to examine the generated tree structure
>> of the index? I certainly went through a few bugs with my test tree system
>> (and that was in python!). Is there any way to examine the index structure
>> for debugging purposes?
>> Also, it looks like `ereport()` is the proper way to emit debugging
>> information. Is this correct?
>> In that vein, is there any way to have information that is on the operations
>> of an entire query? Looking at the number of tree nodes touched for a scan
>> would be nice (and I would not be surprised if there is already a facility
>> for it).
>>
>> Project code is here if anyone is interested, any help would be great. I
>> have very little idea what I'm doing.
>>
>> Thanks,
>> Connor
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Connor Wolf 2014-10-10 07:03:56 Re: [GENERAL] Understanding and implementing a GiST Index
Previous Message Adrian Klaver 2014-10-09 22:09:11 Re: PostgreSQL server won't start, corrupt?

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2014-10-10 04:34:34 Re: [HACKERS] schema-only -n option in pg_restore fails
Previous Message Peter Geoghegan 2014-10-10 01:35:33 Re: Last Commitfest patches waiting review