Re: Parallel append plan instability/randomness

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel append plan instability/randomness
Date: 2018-01-08 04:40:38
Message-ID: CAA4eK1LegxN9of0sydNM5n4up=7oCj=-E94edKDOsYFKyb1R0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 7, 2018 at 5:44 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> According to buildfarm member silverfish,
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=silverfish&dt=2018-01-06%2008%3A53%3A38
>
> it's possible to sometimes get this failure in the regression tests:
>
> *** /mnt/buildfarm/buildroot/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out Tue Dec 19 20:24:02 2017
> --- /mnt/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/select_parallel.out Sat Jan 6 09:21:39 2018
> ***************
> *** 75,84 ****
> Workers Planned: 3
> -> Partial Aggregate
> -> Parallel Append
> -> Seq Scan on d_star
> -> Seq Scan on f_star
> -> Seq Scan on e_star
> - -> Seq Scan on b_star
> -> Seq Scan on c_star
> -> Seq Scan on a_star
> (11 rows)
> --- 75,84 ----
> Workers Planned: 3
> -> Partial Aggregate
> -> Parallel Append
> + -> Seq Scan on b_star
> -> Seq Scan on d_star
> -> Seq Scan on f_star
> -> Seq Scan on e_star
> -> Seq Scan on c_star
> -> Seq Scan on a_star
> (11 rows)
>
> Irreproducible failures in the regression tests are not very acceptable.
> Furthermore, considering that the query being tested is
>
> explain (costs off)
> select round(avg(aa)), sum(aa) from a_star;
>
> it seems to me that the "expected" order of the sub-scans is mighty
> random to begin with.
>

I think order of sub-scans can be random if the number of rows in
child relations can vary across runs. For the above case, the
subpaths (non-partial-paths) are always in the descending order of
their cost and I can see that by running it locally. On my local m/c,
output is as below:

regression=# explain select round(avg(aa)), sum(aa) from a_star;
QUERY PLAN
-------------------------------------------------------------------------------
Finalize Aggregate (cost=2.30..2.31 rows=1 width=40)
-> Gather (cost=2.28..2.29 rows=3 width=40)
Workers Planned: 3
-> Partial Aggregate (cost=2.28..2.29 rows=1 width=40)
-> Parallel Append (cost=0.00..2.20 rows=15 width=4)
-> Seq Scan on d_star (cost=0.00..1.16 rows=16 width=4)
-> Seq Scan on f_star (cost=0.00..1.16 rows=16 width=4)
-> Seq Scan on e_star (cost=0.00..1.07 rows=7 width=4)
-> Seq Scan on b_star (cost=0.00..1.04 rows=4 width=4)
-> Seq Scan on c_star (cost=0.00..1.04 rows=4 width=4)
-> Seq Scan on a_star (cost=0.00..1.03 rows=3 width=4)
(11 rows)

The above indicates that paths are listed in the order as expected.
What makes you think that the order of sub-scans can be random? Is it
possible that the number of rows in child relations can vary across
runs?

One theory that can explain above failure is that the costs of
scanning some of the sub-paths is very close due to which sometimes
the results can vary. If that is the case, then probably using
fuzz_factor in costs comparison (as is done in attached patch) can
improve the situation, may be we have to consider some other factors
like number of rows in each subpath. However, it might be better to
first somehow reproduce this case and see what is going wrong, any
ideas?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
fix_pa_cost_comp_v1.patch application/octet-stream 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-01-08 05:24:54 Re: heads up: Fix for intel hardware bug will lead to performance regressions
Previous Message Thomas Munro 2018-01-08 04:32:27 Re: heads up: Fix for intel hardware bug will lead to performance regressions