Re: BUG #15577: Query returns different results when executed multiple times

From: Bartosz Polnik <bartoszpolnik(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15577: Query returns different results when executed multiple times
Date: 2019-01-07 09:17:47
Message-ID: CAM37Zev2xUezKc0XZFPfi=BtH+of+fqTAYUCJJuf+4Pc6u4G7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> Can you confirm if there are concurrent changes being made to table_b
> perhaps while the query is running?

I confirm. There are no concurrent changes being made and I'm the only
client connecting to that db.

> Do you still get the variation with an otherwise idle database with no
> open transactions that's just received a complete VACUUM?

I executed VACUUM (FULL, ANALYZE) on all tables from the query, but
I'm still getting different results.

With your script, I'm always getting the same output, so it must be
something else.

I thought about sharing db dump, but I can't do that due to the company
policy. What I could share though, is anonymized db dump. Do you know if
there is any good tool for performing anonymization?

On Mon, Jan 7, 2019 at 2:57 AM David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> On Mon, 7 Jan 2019 at 05:46, PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
> > A couple of days ago I stumbled upon a query that doesn't return all
> rows it
> > should.
>
> > -> Nested Loop (actual time=167.978..281.540 rows=11 loops=2)
>
> > -> Nested Loop (actual time=188.718..306.289 rows=14 loops=2)
>
> Above is the first place where the actual row counts diverge. This
> makes it appear that it's the join to table_b that's causing the
> variation. The parallel worker is meant to be running with the same
> snapshot as the main process so that it properly sees the same tuples
> as visible. If for some reason that was not working correctly then
> that might explain why you get differing results.
>
> Can you confirm if there are concurrent changes being made to table_b
> perhaps while the query is running?
>
> Do you still get the variation with an otherwise idle database with no
> open transactions that's just received a complete VACUUM?
>
> I'm able to produce the same plan as you're getting with the attached
> script, but unable to see any sort of row variation. My row counts
> don't match yours exactly. If I adjust the rows in table_b too much I
> get a completely different plan.
>
> I tested with PostgreSQL 11.1, compiled by Visual C++ build 1916, 64-bit.
>
> > I tried to create a clean schema with test data, but couldn't get the
> same
> > execution plan, so I can't include that.
>
> If you're removing columns to simplify the test script then you made
> need to add some columns back to pad the tuples out a bit in order to
> keep the tables around the same size. Variations in the table size
> will have an effect on the join costs and could change the join order.
> Also, a smaller table_c may not receive a parallel seq scan. In my
> mockup, I made table_b and table_a fairly large so as to keep the
> joins as parameterized nested loops. I also left out indexes on the
> "id" columns to reduce the chances of a Merge Join.
>
> --
> David Rowley http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-01-07 12:14:17 BUG #15579: Adding a column with default from configuration parameter fails on 11.1
Previous Message Tom Lane 2019-01-07 05:12:28 Re: BUG #15578: Executing json_populate_recordset with an empty array causes a segmentation fault