Re: Designing an extension for feature-space similarity search

From: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Designing an extension for feature-space similarity search
Date: 2012-02-16 17:12:25
Message-ID: 4F3D38F9.4090301@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Korotkov wrote:
> On Thu, Feb 16, 2012 at 12:34 AM, Jay Levitt <jay(dot)levitt(at)gmail(dot)com
> <mailto:jay(dot)levitt(at)gmail(dot)com>> wrote:
>
> - But a dimension might be in any domain, not just floats
> - The distance along each dimension is a domain-specific function
>
> What exact domains do you expect? Some domains could appear to be quite hard
> for index-based similarity search using GiST (for example, sets, strings etc.).

Oh, nothing nearly so complex, and (to Tom's point) no composite types,
either. Right now we have demographics like gender, geolocation, and
birthdate; I think any domain will be a type that's easily expressible in
linear terms. I was thinking in domains rather than types because there
isn't one distance function for "date" or "float"; me.birthdate <->
you.birthdate "birthdate" is normalized to a different curve than now() <->
posting_date, and raw_score <-> raw_score would differ from z_score <-> z_score.

It would have been elegant to express that distance with <->, but since
domains can't have operators, I can create distance(this, other) functions
for each domain. It just won't look as pretty!

Jay

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dan Scales 2012-02-16 17:18:23 possible new option for wal_sync_method
Previous Message Kohei KaiGai 2012-02-16 17:02:59 Re: pgsql_fdw, FDW for PostgreSQL server