Re: create_index test fails when synchronous_commit = off @ master

From: Andres Freund <andres(at)anarazel(dot)de>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: create_index test fails when synchronous_commit = off @ master
Date: 2022-02-24 15:33:39
Message-ID: 20220224153339.pqn64kseb5gpgl74@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-24 16:47:25 +0300, Aleksander Alekseev wrote:
> - QUERY PLAN
> --------------------------------------------------------
> - Index Only Scan using tenk1_thous_tenthous on tenk1
> - Index Cond: (thousand < 2)
> - Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
> -(3 rows)
> + QUERY PLAN
> +--------------------------------------------------------------------------------------
> + Sort
> + Sort Key: thousand
> + -> Index Only Scan using tenk1_thous_tenthous on tenk1
> + Index Cond: ((thousand < 2) AND (tenthous = ANY
> ('{1001,3000}'::integer[])))
> +(4 rows)

Heh. We've been having a lot of fights with exactly this plan change in the
AIO branch, before cc50080a82, and without synchronous_commit =
off. Interestingly near-exclusively with the regression run within
pg_upgrade's tests.

For aio we (David did a lot of that IIRC) finally hunted it down to be due
vacuum skipping pages due to inability to get a cleanup lock. If that happens
enough, pg_class.relallvisible changes enough to lead to the different plan.

I first was going to suggest that we should just use VACUUM FREEZE to prevent
the issue.

But in this instance the cause isn't cleanup locks, probably that we can't yet
set hint bits due to synchronous_commit=off? But I don't *fully* understand
how it leads to this.

I added the SELECT relpages, reltuples, relallvisible FROM pg_class WHERE oid = 'tenk1'::regclass;
just after the
VACUUM ANALYZE tenk1;

synchronous_commit=on
+ relpages | reltuples | relallvisible
+----------+-----------+---------------
+ 345 | 10000 | 345
+(1 row)

synchronous_commit=off
+ relpages | reltuples | relallvisible
+----------+-----------+---------------
+ 345 | 10000 | 0
+(1 row)

So it clearly is the explanation for the issue.

Obviously we can locally work around it by adding a
SET LOCAL synchronous_commit = local;
to the COPY. But I'd like to fully understand what's going on.

> I didn't investigate further. Do we assume that `make installcheck` suppose
> to pass with a different postgresql.conf options?

Depends on the option, I think... There's some where it's interesting to run
tests with different options and where the effort required is reasonable. And
some cases where it's not... synchronous_commit=off worked until recently, and
I think we should keep it working.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2022-02-24 15:46:28 Re: Add id's to various elements in protocol.sgml
Previous Message Tom Lane 2022-02-24 15:17:28 Re: convert libpq uri-regress tests to tap test