Re: Use cases for lateral that do not involve a set returning function

From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "AJ Welch *EXTERN*" <awelch0100(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Use cases for lateral that do not involve a set returning function
Date: 2014-12-09 10:24:47
Message-ID: A737B7A37273E048B164557ADEF4A58B17DAA12C@ntex2010a.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

AJ Welch wrote:
> http://blog.heapanalytics.com/postgresqls-powerful-new-join-type-lateral/
>
> I suspected some of the claims in the post may not have been accurate. This one in particular:
>
> "Without lateral joins, we would need to resort to PL/pgSQL to do this analysis. Or, if our data set
> were small, we could get away with complex, inefficient queries."
>
>
> The sum(1) and order by time limit 1 approach seemed less than ideal to me and I thought this analysis
> could be done with normal left joins instead of lateral left joins. So I came up with a proof of
> concept:
>
> https://github.com/ajw0100/snippets/tree/master/SQL/lateral
>
>
> Is my conclusion in the README correct? Does anything beyond select...from...where force a nested
> loop? In that case, is lateral really only useful with set returning functions as the docs suggest?
> Does anyone know of any use cases for lateral that do not involve a set returning function?

Only recently I used lateral joins to optimize a query.

This is a sample of how the query looked bfore:

SELECT ...
FROM people p
LEFT JOIN names n
ON (n.people_id = p.people_id
AND current_timestamp > n.validfrom
AND NOT EXISTS (SELECT 1 FROM names n2
WHERE n2.people_id = p.people_id
AND current_timestamp > n2.validfrom
AND n2.validfrom > n.validfrom)
)
WHERE p.id = ...

So basically it is supposed to find the latest valid name for a person.

This required two scans of the "names" table per "person" record.

I rewrote it as

SELECT ...
FROM people p
LEFT JOIN LATERAL (SELECT * FROM names n
WHERE n.people_id = p.people_id
AND current_timestamp > n.validfrom
ORDER BY n.validfrom DESC LIMIT 1) n
ON TRUE
WHERE p.id = ...

With the correct index this touched fewer blocks and worked faster.
Also, though this is of course a matter of taste, it is more readable.

Of course this forces a nested loop, but that is not bad as such.
In my case it was not problem (I tried to hint at that with the WHERE clause).

So yes, I think that LATERAL is useful even without set returning functions.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Begin 2014-12-09 14:58:18 Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
Previous Message Anthony Nowocien 2014-12-09 07:55:14 Detecting corruption