Re: efficient way to do "fuzzy" join

From: Rémi Cura <remi(dot)cura(at)gmail(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: efficient way to do "fuzzy" join
Date: 2014-04-11 15:57:53
Message-ID: CAJvUf_trC9EaxqsMpboO+Qw4c-3qeFjxCpt5frTPYbE3yfU0zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hey,
thanks for your answer.

I think you are right, range type with index could at least provide a fast
matching,
thus avoiding the numrow(A) * numrow(B) complexity .

Though I don't see how to use it to interpolate for more than 1st order.

Cheers,
Rémi-C

2014-04-11 17:09 GMT+02:00 Andy Colson <andy(at)squeakycode(dot)net>:

> On 4/11/2014 7:50 AM, Rémi Cura wrote:
>
>> Hey dear List,
>>
>> I'm looking for some advice about the best way to perform a "fuzzy"
>> join, that is joining two table based on approximate matching.
>>
>> It is about temporal matching
>> given a table A with rows containing data and a control_time (for
>> instance 1 ; 5; 6; .. sec, not necessarly rounded of evenly-spaced)
>>
>> given another table B with lines on no precise timing (eg control_time =
>> 2.3 ; 5.8 ; 6.2 for example)
>>
>> How to join every row of B to A based on
>> min(@(A.control_time-B.control_time))
>> (that is, for every row of B, get the row of A that is temporaly the
>> closest),
>> in an efficient way?
>> (to be explicit, 2.3 would match to 1, 5.8 to 6, 6.2 to 6)
>>
>> Optionnaly, how to get interpolation efficiently (meaning one has to get
>> the previous time and next time for 1 st order interpolation, 2 before
>> and 2 after for 2nd order interpolation, and so on)?
>> (to be explicit 5.8 would match to 5 and 6, the weight being 0.2 and 0.8
>> respectively)
>>
>>
>> Currently my data is spatial so I use Postgis function to interpolate a
>> point on a line, but is is far from efficient or general, and I don't
>> have control on interpolation (only the spatial values are interpolated).
>>
>>
>> Cheers,
>> Rémi-C
>>
>
>
> Have you seen the range type?
>
> http://www.postgresql.org/docs/9.3/static/rangetypes.html
>
> Not fuzzy, but is indexable.
>
> -Andy
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jack.O'Sullivan 2014-04-11 15:59:35 Re: CLOB & BLOB limitations in PostgreSQL
Previous Message Andy Colson 2014-04-11 15:24:13 Re: CLOB & BLOB limitations in PostgreSQL