Re: Sync scan & regression tests

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Sync scan & regression tests
Date: 2023-08-29 10:35:44
Message-ID: 6f991389-ae22-d844-a9d8-9aceb7c01a9a@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(noticed this thread just now)

On 07/08/2023 03:55, Tom Lane wrote:
> Having said that ... I just noticed that chipmunk, which I'd been
> ignoring because it had been having configuration-related failures
> ever since it came back to life about three months ago, has gotten
> past those problems

Yes, I finally got around to fix the configuration...

> and is now failing with what looks suspiciously like syncscan
> problems:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2023-08-06%2008%3A20%3A21
>
> This is possibly explained by the fact that it uses (per its
> extra_config)
> 'shared_buffers = 10MB',
> although it's done that for a long time and portals.out hasn't changed
> since before chipmunk's latest success. Perhaps some change in an
> earlier test script affected this?

I changed the config yesterday to 'shared_buffers = 20MB', before seeing
this thread. If we expect the regression tests to work with
'shared_buffers=10MB' - and I think that would be nice - I'll change it
back. But let's see what happens with 'shared_buffers=20MB'. It will
take a few days for the tests to complete.

> I think it'd be worth running to ground exactly what is causing these
> failures and when they started.

I bisected it to this commit:

commit 7ae0ab0ad9704b10400a611a9af56c63304c27a5
Author: David Rowley <drowley(at)postgresql(dot)org>
Date: Fri Feb 3 16:20:43 2023 +1300

Reduce code duplication between heapgettup and heapgettup_pagemode

The code to get the next block number was exactly the same between
these
two functions, so let's just put it into a helper function and call
that
from both locations.

Author: Melanie Plageman
Reviewed-by: Andres Freund, David Rowley
Discussion:
https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com

From that description, that was supposed to be just code refactoring,
with no change in behavior.

Looking the new heapgettup_advance_block() function and the code that it
replaced, it's now skipping this ss_report_location() on the last call,
when it has reached the end of the scan:

>
> /*
> * Report our new scan position for synchronization purposes. We
> * don't do that when moving backwards, however. That would just
> * mess up any other forward-moving scanners.
> *
> * Note: we do this before checking for end of scan so that the
> * final state of the position hint is back at the start of the
> * rel. That's not strictly necessary, but otherwise when you run
> * the same query multiple times the starting position would shift
> * a little bit backwards on every invocation, which is confusing.
> * We don't guarantee any specific ordering in general, though.
> */
> if (scan->rs_base.rs_flags & SO_ALLOW_SYNC)
> ss_report_location(scan->rs_base.rs_rd, block);

The comment explicitly says that we do that before checking for
end-of-scan, but that commit moved it to after. I propose the attached
to move it back to before the end-of-scan checks.

Melanie, David, any thoughts?

> But assuming that they are triggered by syncscan, my first thought
> about dealing with it is to disable syncscan within the affected
> tests. However ... do we exercise the syncscan logic at all within
> the regression tests, normally? Is there a coverage issue to be
> dealt with?
Good question..

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachment Content-Type Size
fix-syncscan-regress-failures.patch text/x-patch 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-08-29 10:41:28 Re: Support run-time partition pruning for hash join
Previous Message Daniel Gustafsson 2023-08-29 10:02:48 Re: Prevent psql \watch from running queries that return no rows