From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <marko(at)joh(dot)to>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Fletcher <andy(at)prestigedigital(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query |
Date: | 2018-08-15 11:10:07 |
Message-ID: | 20180815111006.GB3326@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Greetings,
* Amit Kapila (amit(dot)kapila16(at)gmail(dot)com) wrote:
> On Tue, Aug 14, 2018 at 9:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Marko Tiikkaja <marko(at)joh(dot)to> writes:
> >> Marking the function parallel safe doesn't seem wrong to me. The
> >> non-parallel-safe part is that the input gets fed to it in different order
> >> in different workers. And I don't really think that to be the function's
> >> fault.
> >
> > So that basically opens the question of whether *any* window function
> > calculation can safely be pushed down to parallel workers.
>
> I think we can consider it as a parallel-restricted operation. For
> the purpose of testing, I have marked row_number as
> parallel-restricted in pg_proc and I get the below plan:
>
> postgres=# Explain select count(*) from qwr where (a, b) in (select a,
> row_number() over() from qwr);
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------
> Aggregate (cost=46522.12..46522.13 rows=1 width=8)
> -> Hash Semi Join (cost=24352.08..46362.12 rows=64001 width=0)
> Hash Cond: ((qwr.a = qwr_1.a) AND (qwr.b = (row_number() OVER (?))))
> -> Gather (cost=0.00..18926.01 rows=128002 width=8)
> Workers Planned: 2
> -> Parallel Seq Scan on qwr (cost=0.00..18926.01
> rows=64001 width=8)
> -> Hash (cost=21806.06..21806.06 rows=128002 width=12)
> -> WindowAgg (cost=0.00..20526.04 rows=128002 width=12)
> -> Gather (cost=0.00..18926.01 rows=128002 width=4)
> Workers Planned: 2
> -> Parallel Seq Scan on qwr qwr_1
> (cost=0.00..18926.01 rows=64001 width=4)
> (11 rows)
>
> This seems okay, though the results of the above parallel-execution
> are not same as serial-execution. I think the reason for it is that
> we don't get rows in predictable order from workers.
You wouldn't get them in a predictable order even without
parallelization due to the lack of an ordering, so this hardly seems
like an issue.
> > Somewhat like the LIMIT/OFFSET case, it seems to me that we could only
> > expect to do this safely if the row ordering induced by the WINDOW clause
> > can be proven to be fully deterministic. The planner has no such smarts
> > at the moment AFAIR. In principle you could do it if there were
> > partitioning/ordering by a primary key, but I'm not excited about the
> > prospects of that being true often enough in practice to justify making
> > the check.
>
> Yeah, I am also not sure if it is worth adding the additional checks.
> So, for now, we can treat any window function calculation as
> parallel-restricted and if later anybody has a reason strong enough to
> relax the restriction for some particular case, we will consider it.
Seems likely that we'll want this at some point, but certainly seems
like new work and not a small bit of it.
Thanks!
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2018-08-15 13:22:34 | BUG #15327: postgres segfaults on ALTER FUNCTION ... SET SCHEMA ... |
Previous Message | Amit Kapila | 2018-08-15 06:51:54 | Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query |