Re: ASOF join

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ASOF join
Date: 2017-06-22 01:30:06
Message-ID: CAEepm=2BoDCetwbpJV2KSamr6SWGTW6vjfDE5xuqdFDssYjwoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 21, 2017 at 9:46 PM, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> Thank you for this idea. I agree that it is the best way of implementing
> ASOF join - just as optimization of standard SQL query.

Great. I think this part definitely has potential.

> But do you think that still it will be good idea to extend SQL syntax with
> ASOF JOIN ... USING ... clause? It will significantly simplify writing
> queries like above
> and IMHO doesn't introduce some confusions with standard SQL syntax. My
> primary idea of suggesting ASOF join for Postgres was not just building
> more efficient plan (using merge join instead of nested loop) but also
> simplifying writing of such queries. Or do you think that nobody will be
> interested in non-standard SQL extensions?

I can see the appeal, but I expect it to be difficult to convince the
project to accept a non-standard syntax for a niche use case that can
be expressed already. Q is super terse and designed for time series
data. SQL is neither of those things.

Some first reactions to the syntaxes you mentioned:

1. times LEFT ASOF JOIN ticks ON ticks.time <= times.time
2. times LEFT ASOF JOIN ticks USING (time)
3. times LEFT ASOF JOIN ticks USING (ticks.time, times.time)

The USING ideas don't seem to be general enough, because there is no
place to say whether to use a lower or higher value if there is no
match, or did I miss something? Relying on an ORDER BY clause in the
query to control the meaning of the join seems too weird, and making
it always (for example) <= would be an arbitrary limitation. The
first syntax at least has enough information: when you say one of <,
>, <=, >= you also imply the search order. I'm not sure if there are
any problems with that, perhaps when combined with other quals.

The equivalent nearly-standard syntax is definitely quite verbose, but
it has the merit of being absolutely explicit about which row from
'ticks' will be selected:

times LEFT JOIN LATERAL (SELECT * FROM ticks
WHERE ticks.time <= times.time
ORDER BY ticks.time DESC LIMIT 1) x ON true

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2017-06-22 02:06:20 Re: An attempt to reduce WALWriteLock contention
Previous Message Andres Freund 2017-06-22 01:24:06 Re: An attempt to reduce WALWriteLock contention