Re: BUG #15577: Query returns different results when executed multiple times

From: Bartosz Polnik <bartoszpolnik(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15577: Query returns different results when executed multiple times
Date: 2019-01-07 21:39:17
Message-ID: CAM37ZevSHt0pdKhZ_Bu7nDoMMWSDsJaaPUPR_ApowoFida4fcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

For the following sql:

*SET force_parallel_mode = on;*
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);

I got:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=145.151..284.612 rows=26 loops=1)
-> Gather (actual time=145.092..304.166 rows=26 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=153.289..247.768 rows=13 loops=2)
-> Hash Join (actual time=0.221..15.028 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.085..14.442 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.044..0.045 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.028..0.033 rows=8 loops=2)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.242..0.246 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.016..0.016 rows=1 loops=26)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 5.024 ms
Execution Time: 304.858 ms

On Second execution:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=228.405..279.383 rows=19 loops=1)
-> Gather (actual time=228.376..288.807 rows=19 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=150.785..242.440 rows=10 loops=2)
-> Hash Join (actual time=0.212..14.353 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.075..13.794 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.050..0.050 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d_id td (actual
time=0.031..0.037 rows=8 loops=2)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.237..0.241 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.021..0.021 rows=1 loops=19)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 4.816 ms
Execution Time: 289.503 ms

I executed it a couple of times and saw in output 26 rows (as above), 21
rows (also above), 28 rows, 28 rows, 29 rows etc.
The changing parts (nr of rows) were: Nested Loop Left Join and Nested Loop.

For the following sql:

*SET parallel_leader_participation = on;*
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);

I received:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=143.805..279.354 rows=29 loops=1)
-> Gather (actual time=143.735..299.066 rows=29 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=149.334..242.108 rows=15 loops=2)
-> Hash Join (actual time=0.203..14.536 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.072..13.971 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.043..0.043 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.028..0.034 rows=8 loops=2)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.237..0.240 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.018..0.018 rows=1 loops=29)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 5.070 ms
Execution Time: 299.849 ms

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=231.714..276.770 rows=26 loops=1)
-> Gather (actual time=231.654..295.971 rows=26 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (actual time=151.209..239.965 rows=13 loops=2)
-> Hash Join (actual time=0.214..14.467 rows=945 loops=2)
Hash Cond: (tc.table_d_id = td.id)
-> Parallel Seq Scan on table_c tc (actual
time=0.070..13.923 rows=1751 loops=2)
Filter: (table_e_id = 4)
Rows Removed by Filter: 49117
-> Hash (actual time=0.045..0.045 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.028..0.033 rows=8 loops=2)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.235..0.238 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.017..0.017 rows=1 loops=26)
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))
Planning Time: 4.976 ms
Execution Time: 296.729 ms

I also made a test with:

*SET parallel_leader_participation = off;*
*SET force_parallel_mode = on;*
explain (costs false, analyze true)
SELECT ta.id AS table_a_id,
tc.id as table_c_id,
tba.id AS table_b_id
FROM test.table_b_active tba
INNER JOIN test.table_c tc ON tba.target_id = tc.id
INNER JOIN test.table_d td ON tc.table_d_id = td.id
LEFT JOIN test.table_a ta ON ta.table_c_id = tc.id AND ta.date =
'2018-08-31' :: DATE
WHERE tba.date BETWEEN '2018-08-10' :: DATE AND '2018-09-01' :: DATE
AND td.group = 'A'
AND tc.table_e_id = 4
AND (
(tba.target_id = tc.id AND tba.group_type = 'A')
OR tba.source_id = tc.id
);

The output is correct - 31 rows, but the plan is different:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (actual time=263.880..507.668 rows=31 loops=1)
Workers Planned: 1
Workers Launched: 1
Single Copy: true
-> Nested Loop Left Join (actual time=193.246..415.020 rows=31 loops=1)
-> Nested Loop (actual time=192.997..414.336 rows=31 loops=1)
-> Hash Join (actual time=0.287..23.958 rows=1890 loops=1)
Hash Cond: (tc.table_d_id = td.id)
-> Seq Scan on table_c tc (actual time=0.128..23.044
rows=3501 loops=1)
Filter: (table_e_id = 4)
Rows Removed by Filter: 98233
-> Hash (actual time=0.064..0.069 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on table_d td (actual
time=0.045..0.052 rows=8 loops=1)
Filter: (group = 'A'::test.group)
Rows Removed by Filter: 55
-> Index Scan using table_b_idx_target_id on table_b
(actual time=0.203..0.206 rows=0 loops=1890)
Index Cond: (target_id = tc.id)
Filter: ((date >= '2018-08-10'::date) AND (date <=
'2018-09-01'::date) AND (((target_id = tc.id) AND (group_type =
'A'::test.group_type)) OR (source_id = tc.id)))
Rows Removed by Filter: 26
-> Index Scan using table_a_uq_001 on table_a ta (actual
time=0.021..0.021 rows=1 loops=31)
Index Cond: ((table_c_id = tc.id) AND (date =
'2018-08-31'::date))
Planning Time: 5.051 ms
Execution Time: 507.858 ms

On Mon, Jan 7, 2019 at 8:59 PM Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:

> On Tue, Jan 8, 2019 at 1:56 AM Bartosz Polnik <bartoszpolnik(at)gmail(dot)com>
> wrote:
> > I've modified the script:
> > 1. I Added SET force_parallel_mode = on;:
>
> What if you try that with your original query in the database that
> shows the problem? Or alternatively SET parallel_leader_participation
> = on. What I'm wondering is whether the parallel worker is
> consistently getting the same result, but it's a different result from
> the leader, and the variation comes from the race between leader and
> worker, or whether the instability comes from somewhere else.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2019-01-07 22:22:27 Re: BUG #15577: Query returns different results when executed multiple times
Previous Message Andrew Dunstan 2019-01-07 21:22:31 Re: BUG #15579: Adding a column with default from configuration parameter fails on 11.1