Re: enable_incremental_sort changes query behavior

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: enable_incremental_sort changes query behavior
Date: 2020-10-02 20:56:20
Message-ID: 20201002205620.sbmfqdmkprjrlpgz@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 02, 2020 at 04:12:11PM -0400, James Coleman wrote:
>On Fri, Oct 2, 2020 at 2:25 PM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On Fri, Oct 02, 2020 at 10:55:14AM -0400, James Coleman wrote:
>> >On Fri, Oct 2, 2020 at 10:53 AM James Coleman <jtc331(at)gmail(dot)com> wrote:
>> >>
>> >> On Fri, Oct 2, 2020 at 10:32 AM Tomas Vondra
>> >> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> >> >
>> >> > On Fri, Oct 02, 2020 at 09:19:44AM -0400, James Coleman wrote:
>> >> > >
>> >> > > ...
>> >> > >
>> >> > >I've been able to confirm that the problem goes away if we stop adding
>> >> > >the gather merge paths in generate_useful_gather_paths().
>> >> > >
>> >> > >I'm not sure yet what conclusion that leads us to. It seems to be that
>> >> > >the biggest clue remains that all of this works correctly unless one
>> >> > >of the selected columns (which happens to be a pathkey at this point
>> >> > >because it's a DISTINCT query) contains a volatile expression.
>> >> > >
>> >> >
>> >> > Yeah. It seems to me this is a bug in get_useful_pathkeys_for_relation,
>> >> > which is calling find_em_expr_for_rel and is happy with anything it
>> >> > returns. But this was copied from postgres_fdw, which however does a bit
>> >> > more here:
>> >> >
>> >> > if (pathkey_ec->ec_has_volatile ||
>> >> > !(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
>> >> > !is_foreign_expr(root, rel, em_expr))
>> >> >
>> >> > So not only does it explicitly check volatility of the pathkey, it also
>> >> > calls is_foreign_expr which checks the expression for mutable functions.
>> >> >
>> >> > The attached patch seems to fix this, but it only adds the check for
>> >> > mutable functions. Maybe it should check ec_has_volatile too ...
>> >>
>> >> We actually discussed the volatility check in that function back on
>> >> the original thread [1], and we'd concluded that was specifically
>> >> necessary for the fdw code because the function would execute on two
>> >> different servers (and thus produce different results), but that in a
>> >> local server only scenario it should be fine.
>> >>
>> >> My understanding (correct me if I'm wrong) is that the volatile
>> >> function should only be executed once (at the scan level?) to build
>> >> the tuple and from then on the datum isn't going to change, so I'm not
>> >> sure why the volatility would matter here.
>> >>
>> >> James
>> >>
>> >> 1:
>> >
>> >Oh, hmm, could what I said all be true, but there still be some rule
>> >that you shouldn't compare datums generated from volatile expressions
>> >in different backends (i.e., parallel query)?
>> >
>> I'm not sure it's all that related to parallel query - it's more likely
>> that when constructing the paths below the gather merge, this new code
>> fails to do something important.
>> I'm not 100% sure how the grouping and volatile functions work, so let
>> me think aloud here ...
>> The backtrace looks like this:
>> #0 get_sortgroupref_tle
>> #1 0x0000000000808ab9 in prepare_sort_from_pathkeys
>> #2 0x000000000080926c in make_sort_from_pathkeys
>> #3 0x0000000000801032 in create_sort_plan
>> #4 0x00000000007fe7e0 in create_plan_recurse
>> #5 0x0000000000800b2c in create_gather_merge_plan
>> #6 0x00000000007fe94d in create_plan_recurse
>> #7 0x0000000000805328 in create_nestloop_plan
>> #8 0x00000000007ff3c5 in create_join_plan
>> #9 0x00000000007fe5f8 in create_plan_recurse
>> #10 0x0000000000800d68 in create_projection_plan
>> #11 0x00000000007fe662 in create_plan_recurse
>> #12 0x0000000000801252 in create_upper_unique_plan
>> #13 0x00000000007fe760 in create_plan_recurse
>> #14 0x00000000007fe4f2 in create_plan
>> #15 0x000000000081082f in standard_planner
>> and the create_sort_plan works with lefttree that is IndexScan, so the
>> query we're constructing looks like this:
>> Distinct
>> -> Nestloop
>> -> Gather Merge
>> -> Sort
>> -> Index Scan
>> and it's the sort that expects to find the expression in the Index Scan
>> target list. Which seems rather bogus, because clearly the index scan
>> does not include the expression. (I wonder if it's somehow related that
>> indexes can't be built on volatile expressions ...)
>The index scan does include values not found in the index though,
>right? It returns the whole tuple -- though I'm not sure if it
>includes calculated values (BTW: is this what is meant by a
>"projection" being added? Or is that something else?)

AFAIK index scans are projection-capable - at least it seems like that
from is_projection_capable_path. But the volatility blocks that, I think
(see below).

>> Anyway, the index scan clearly does not include the expression the sort
>> references, hence the failure. And the index can can't compute it,
>> because we probably need to compute it on top of the join I think
>> (otherwise we might get duplicate values for volatile functions etc.)
>In this particular query there's no reason why we'd have to wait to
>compute it until after the join, right?
>For example, if we take away the parallelism but still force a Unique
>plan, we get this:
> Unique
> -> Sort
> Sort Key: ref_0.i, (CASE WHEN pg_rotate_logfile_old() THEN
>ref_0.t ELSE ref_0.t END)
> -> Nested Loop
> -> Seq Scan on ref_0
> -> Function Scan on generate_series ref_1
>And I don't see any reason why the CASE statement couldn't in theory
>(I don't know the internals enough to know when it actually happens)
>be done as part of the base relation scan (in this case, the seq
>scan). It's not dependent on any information from the join.

Imagine a query like this:

select, volatile_func() from t1 join t2 using (id);

and now assume t2 contains duplicate values. If the volatile_func gets
evaluated as part of the t1 scan, then we'll get multiple occurrences
in the results, due to the t2 duplicates. I belive volatile functions
are not expected to behave like that - the docs say:

A query using a volatile function will re-evaluate the function at
every row where its value is needed..

And I assume this references to rows at the level where the function is
used, i.e. after the join.

>Furthermore the same plan works correctly for a non-volatile
>expression, so that means (I think) that the relation scan is capable
>of producing that expression in the tuple it generates (either that or
>the sort is generating it?).

I do believe the reason is exactly that non-volatile expressions can be
pushed down, so that we actually find them in the target list of the
node below the sort.

See explains for the second set of queries, where I used a simple
non-volatile expression.

>> Looking at this from a slightly different angle, the root cause here
>> seems to be that generate_useful_gather_paths uses the pathkeys it gets
>> from get_useful_pathkeys_for_relation, which means root->query_pathkeys.
>> But all other create_gather_merge_calls use root->sort_pathkeys, so
>> maybe this is the actual problem and get_useful_pathkeys_for_relation
>> should use root->sort_pathkeys instead. That does fix the issue for me
>> too (and it passes all regression tests).
>So I'm not convinced this is the correct way forward. It seems we want
>to be able to apply this as broadly as possible, and using
>sort_pathkeys here means we wouldn't use it for the DISTINCT case at
>all, right?
>So far I think excluding volatile expressions (or mutable functions?)
>is the best way we've discussed so far, though I'm clear yet on why it
>is that that's a necessary restriction. If the answer is "there are
>cases where it would be unsafe for parallel query even though it isn't
>here...", I'm fine with that, but if so, we should make sure that's in
>the code comments/readmes somewhere.
>I also verified that the functions I've been playing with
>(pg_rotate_logfile_old and md5) are marked parallel safe in pg_proc.

More importanly, it does not actually fix the issue - it does fix that
particular query, but just replacing the DISTINCT with either ORDER BY
or GROUP BY make it fail again :-(

Attached is a simple script I used, demonstrating these issues for the
three cases that expect to have ressortgroupref != 0 (per the comment
before TargetEntry in plannodes.h).


Tomas Vondra
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
failure.sql application/sql 2.4 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-10-02 21:45:52 Re: enable_incremental_sort changes query behavior
Previous Message David G. Johnston 2020-10-02 20:42:47 Re: Prepared Statements