Re: Parallelize correlated subqueries that execute within each worker

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Parallelize correlated subqueries that execute within each worker
Date: 2023-01-19 02:34:21
Message-ID: CAAaqYe8=wzXDB5fpO8geK+DiTdz9HaDBRy-2JOiAPC-8vcnXDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 18, 2023 at 2:09 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Hi,
>
> This patch hasn't been updated since September, and it got broken by
> 4a29eabd1d91c5484426bc5836e0a7143b064f5a which the incremental sort
> stuff a little bit. But the breakage was rather limited, so I took a
> stab at fixing it - attached is the result, hopefully correct.

Thanks for fixing this up; the changes look correct to me.

> I also added a couple minor comments about stuff I noticed while
> rebasing and skimming the patch, I kept those in separate commits.
> There's also a couple pre-existing TODOs.

I started work on some of these, but wasn't able to finish this
evening, so I don't have an updated series yet.

> James, what's your plan with this patch. Do you intend to work on it for
> PG16, or are there some issues I missed in the thread?

I'd love to see it get into PG16. I don't have any known issues, but
reviewing activity has been light. Originally Robert had had some
concerns about my original approach; I think my updated approach
resolves those issues, but it'd be good to have that sign-off.

Beyond that I'm mostly looking for review and evaluation of the
approach I've taken; of note is my description of that in [1].

> One of the queries in in incremental_sort changed plans a little bit:
>
> explain (costs off) select distinct
> unique1,
> (select t.unique1 from tenk1 where tenk1.unique1 = t.unique1)
> from tenk1 t, generate_series(1, 1000);
>
> switched from
>
> Unique (cost=18582710.41..18747375.21 rows=10000 width=8)
> -> Gather Merge (cost=18582710.41..18697375.21 rows=10000000 ...)
> Workers Planned: 2
> -> Sort (cost=18582710.39..18593127.06 rows=4166667 ...)
> Sort Key: t.unique1, ((SubPlan 1))
> ...
>
> to
>
> Unique (cost=18582710.41..18614268.91 rows=10000 ...)
> -> Gather Merge (cost=18582710.41..18614168.91 rows=20000 ...)
> Workers Planned: 2
> -> Unique (cost=18582710.39..18613960.39 rows=10000 ...)
> -> Sort (cost=18582710.39..18593127.06 ...)
> Sort Key: t.unique1, ((SubPlan 1))
> ...
>
> which probably makes sense, as the cost estimate decreases a bit.

Off the cuff that seems fine. I'll read it over again when I send the
updated series.

James Coleman

1: https://www.postgresql.org/message-id/CAAaqYe8m0DHUWk7gLKb_C4abTD4nMkU26ErE%3Dahow4zNMZbzPQ%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-19 02:46:17 Re: Minimal logical decoding on standbys
Previous Message Peter Geoghegan 2023-01-19 02:21:33 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation