Quick Links

Re: select from grouped data

From:	James Keener <jim(at)jimkeener(dot)com>
To:	pgsql-novice(at)lists(dot)postgresql(dot)org,Stephen Frost <sfrost(at)snowman(dot)net>,Roger Mason <rmason(at)mun(dot)ca>,"David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc:	pgsql-novice <pgsql-novice(at)postgresql(dot)org>
Subject:	Re: select from grouped data
Date:	2021-08-25 21:37:42
Message-ID:	77511C44-068E-4047-A955-9C6C08BC1942@jimkeener.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

I don't know that "generally better" is the correct description. There are a lot of things not covered by the SQL standard and it would be crazy not to make use of them, e.g. insert on conflict and distinct on, especially when they make the query more readable and more correct. Worrying about sql portability is almost always time not well spent and an invitation to do things not as well as they can be.

Jim

On August 25, 2021 4:42:56 PM EDT, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>Greetings,
>
>* Roger Mason (rmason(at)mun(dot)ca) wrote:
>> I have written this function to process some multiline text data into
>a
>> table with both text & numeric columns:
>>
>> CREATE OR REPLACE FUNCTION get_final_energy ()
>> RETURNS TABLE (
>> jid text,
>> "timestamp" text,
>> scf integer,
>> energy double precision
>> )
>> AS $$
>> WITH a AS (
>> SELECT
>> jid,
>> regexp_split_to_table(totenergy_out, '\n') AS teo
>> FROM
>> results
>> ),
>> b AS (
>> SELECT
>> results.jid,
>> results. "timestamp",
>> cast(
>> CASE WHEN split_part(a.teo, ' ', 2) = '' THEN
>> '0'
>> ELSE
>> split_part(a.teo, ' ', 2)
>> END AS integer) AS scf,
>> cast(
>> CASE WHEN split_part(a.teo, ' ', 3) = '' THEN
>> '0.0'
>> ELSE
>> split_part(a.teo, ' ', 3)
>> END AS double precision) AS energy
>> FROM
>> results,
>> a
>> WHERE
>> results.jid = a.jid
>> GROUP BY
>> results.jid,
>> results. "timestamp",
>> a.teo
>> --HAVING
>> -- scf = max(scf)
>> ORDER BY
>> timestamp ASC,
>> scf DESC
>> )
>> SELECT
>> *
>> FROM
>> b;
>>
>> $$
>> LANGUAGE sql;
>>
>> The output looks like:
>>
>> jid | timestamp | scf | energy
>> ------------+-----------------+-----+----------------
>> 1250_1 | 20210805-114634 | 18 | -1316.43700819
>> 1250_1 | 20210805-114634 | 17 | -1316.43700825
>> 1250_1 | 20210805-114634 | 16 | -1316.4370097
>> 1250_1 | 20210805-114634 | 15 | -1316.43700991
>> 1250_1 | 20210805-114634 | 14 | -1316.43699775
>> 1250_1 | 20210805-114634 | 13 | -1316.43699117
>> 1250_1 | 20210805-114634 | 12 | -1316.43750771
>> 1250_1 | 20210805-114634 | 11 | -1316.43805358
>> 1250_1 | 20210805-114634 | 10 | -1316.43857192
>> 1250_1 | 20210805-114634 | 9 | -1316.43070942
>> 1251_1 | 20210806-062539 | 18 | -1316.43700819
>> 1251_1 | 20210806-062539 | 17 | -1316.43700826
>> ....
>>
>> What I want is to get (for each group) the energy corresponding to
>the
>> maximum value of scf.
>
>* David G. Johnston (david(dot)g(dot)johnston(at)gmail(dot)com) wrote:
>> >
>> >
>> > The output looks like:
>> >
>> > jid | timestamp | scf | energy
>> > ------------+-----------------+-----+----------------
>> > 1250_1 | 20210805-114634 | 18 | -1316.43700819
>> > 1250_1 | 20210805-114634 | 17 | -1316.43700825
>> > 1250_1 | 20210805-114634 | 16 | -1316.4370097
>> > 1250_1 | 20210805-114634 | 15 | -1316.43700991
>> > 1250_1 | 20210805-114634 | 14 | -1316.43699775
>> > 1250_1 | 20210805-114634 | 13 | -1316.43699117
>> > 1250_1 | 20210805-114634 | 12 | -1316.43750771
>> > 1250_1 | 20210805-114634 | 11 | -1316.43805358
>> > 1250_1 | 20210805-114634 | 10 | -1316.43857192
>> > 1250_1 | 20210805-114634 | 9 | -1316.43070942
>> > 1251_1 | 20210806-062539 | 18 | -1316.43700819
>> > 1251_1 | 20210806-062539 | 17 | -1316.43700826
>> > ....
>> >
>> > What I want is to get (for each group) the energy corresponding to
>the
>> > maximum value of scf.
>> >
>> >
>> SELECT DISTINCT ON (jid) jid, timestamp, scf, energy [...] ORDER BY
>jid,
>> scf DESC
>
>While this works, it's generally better to use a LATERAL join as that's
>part of the SQL standard while DISTINCT ON isn't. Using a LATERAL join
>also would allow you to have multiple rows (top-N) if you wanted.
>You'd
>do that using:
>
>WITH jids AS (SELECT jid FROM results GROUP BY jid)
>SELECT jids.jid, t.ts, t.scf, t.energy
>FROM jids CROSS JOIN LATERAL
> (SELECT ts, scf, energy FROM results
> WHERE results.jid = jids.jid
> ORDER BY scf DESC
> LIMIT 1) AS t
>;
>
>A couple of notes on this: don't name a column "timestamp" and when
>it's
>a timestamp, use the 'timestamptz' data type, not text. Your jids sure
>look like they should just be numbers instead of text too. If you
>already have a distinct set of jids somewhere (like in another table),
>you could use that as the source table instead of the CTE that I'm
>using
>above.
>
>Thanks,
>
>Stephen

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Re: select from grouped data at 2021-08-25 20:42:56 from Stephen Frost

Responses

Re: select from grouped data at 2021-08-25 22:03:56 from Stephen Frost

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Stephen Frost	2021-08-25 22:03:56	Re: select from grouped data
Previous Message	Stephen Frost	2021-08-25 20:42:56	Re: select from grouped data