Re: Sync scan & regression tests

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Sync scan & regression tests
Date: 2023-08-07 00:55:36
Message-ID: 367973.1691369736@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Mon, Aug 7, 2023 at 7:21 AM Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>> Two tests are failed because of sync scan - this tests cluster.sql and
>> portals.sql perform seqscan without explicit order by and expect that
>> data will be returned in particular order. But because of sync scan it
>> doesn't happen. Small shared buffers are needed to satisfy seqscan
>> criteria in heapam.c: `scan->rs_nblocks > NBuffers / 4` for tenk1 table.

> I wondered the same thing while working on the tests in commit
> 8ab0ebb9a84, which explicitly care about physical order, so they *say
> so* with ORDER BY ctid. But the problem seems quite widespread, so I
> didn't volunteer to try to do something like that everywhere, when Tom
> committed cbf4177f for 027_stream_regress.pl.

Our customary theory about that is explained in regress.sgml:

You might wonder why we don't order all the regression test queries explicitly
to get rid of this issue once and for all. The reason is that that would
make the regression tests less useful, not more, since they'd tend
to exercise query plan types that produce ordered results to the
exclusion of those that don't.

Having said that ... I just noticed that chipmunk, which I'd been
ignoring because it had been having configuration-related failures
ever since it came back to life about three months ago, has gotten
past those problems and is now failing with what looks suspiciously
like syncscan problems:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2023-08-06%2008%3A20%3A21

This is possibly explained by the fact that it uses (per its
extra_config)
'shared_buffers = 10MB',
although it's done that for a long time and portals.out hasn't changed
since before chipmunk's latest success. Perhaps some change in an
earlier test script affected this?

I think it'd be worth running to ground exactly what is causing these
failures and when they started. But assuming that they are triggered
by syncscan, my first thought about dealing with it is to disable
syncscan within the affected tests. However ... do we exercise the
syncscan logic at all within the regression tests, normally? Is there
a coverage issue to be dealt with?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2023-08-07 02:02:48 Re: pgbench: allow to exit immediately when any client is aborted
Previous Message Peter Geoghegan 2023-08-07 00:38:26 Re: Use of additional index columns in rows filtering