Re: Awkward Join between generate_series and long table

From: Lincoln Swaine-Moore <lswainemoore(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Awkward Join between generate_series and long table
Date: 2023-11-09 02:19:34
Message-ID: CABcidk+1h0RidMehRcDL31XxO+o6e1fJTVyAEQmfcRc-tnoyQw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> Maybe doesn't help overall but this can be equivalently written as:
s + '1 day'::interval as period_end

Ah, so I've glossed over a detail here which is that I'm relying on some
timezone specific behavior and not actually generate_series itself. If
you're curious, the details are here:
https://www.postgresql.org/message-id/2582288.1696428710%40sss.pgh.pa.us

I think that makes the window function necessary, or at least something a
little more sophisticated than addition of a day (though I'd be happy to be
wrong about that).

> LEFT JOIN LATERAL (SELECT

Oh wow, this seems to get the index used! That's wonderful news--thank you.

I'd be super curious if anyone has any intuition about why the planner is
so much more successful there--most of what I see online about LATERAL
JOINs is focused as you said on semantics not performance. But in terms of
solving my problem, this seems to do the trick.

Thanks again!

On Wed, Nov 8, 2023 at 5:45 PM David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
wrote:

> On Wed, Nov 8, 2023 at 6:26 PM Lincoln Swaine-Moore <
> lswainemoore(at)gmail(dot)com> wrote:
>
>> SELECT
>>
> s at time zone 'utc' AS period_start,
>> LEAD(s) OVER (
>> ORDER BY
>> s
>> ) at time zone 'utc' AS period_end
>>
>
> Maybe doesn't help overall but this can be equivalently written as:
> s + '1 day'::interval as period_end
>
> Resorting to a window function here is expensive waste, the lead() value
> can be computed, not queried.
>
>
>> SELECT
>> p.period_start,
>> p.period_end,
>> COUNT (distinct d.id)
>> FROM
>> periods p
>> LEFT JOIN data d
>> ON
>> d.timestamp >= (p.period_start)
>> AND d."timestamp" < (p.period_end)
>> AND d.sn = 'BLAH'
>>
>
> This seems better written (semantically, not sure about execution
> dynamics) as:
>
> FROM periods AS p
> LEFT JOIN LATERAL (SELECT count(distinct? d.id) FROM data AS d WHERE
> d.timestamp >= p.period_start AND d.timestamp < p.period_end AND d.sn =
> 'BLAH') AS cnt_d
> -- NO grouping required at this query level
>
> David J.
>
>

--
Lincoln Swaine-Moore

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Philip Semanchuk 2023-11-09 13:54:36 Re: Awkward Join between generate_series and long table
Previous Message David G. Johnston 2023-11-09 01:44:53 Re: Awkward Join between generate_series and long table