Quick Links

Re: BRIN autosummarization lacking a snapshot

From:	Álvaro Herrera <alvherre(at)kurilemu(dot)de>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: BRIN autosummarization lacking a snapshot
Date:	2025-11-04 16:49:31
Message-ID:	202511041648.nofajnuddmwk@alvherre.pgsql
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2025-Nov-04, Álvaro Herrera wrote:

> With my initial try of this test, just counting the number of BRIN
> tuples, I was _really_ surprised that the index did indeed contain the
> expected number of tuples, even when the error was being thrown. This
> turned out to be expected, because the way BRIN summarization works is
> that we insert a placeholder tuple first, then update it to the correct
> value, and the error only aborts the second part. That's why I needed
> to add a WHERE clause to only count non-placeholder tuples.

I see that skink (buildfarm animal that runs under valgrind) has failed.
Gotta vacate the premises, will study later.

2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] LOG: process 2118443 still waiting for ShareUpdateExclusiveLock on relation 16436 of database 5 after 1016.528 ms
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] DETAIL: Process holding the lock: 2118078. Wait queue: 2118443.
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:46.298 CET [2118078][autovacuum worker][103/9:766] ERROR: canceling autovacuum task
2025-11-04 16:44:46.298 CET [2118078][autovacuum worker][103/9:766] CONTEXT: automatic analyze of table "postgres.public.journal"
2025-11-04 16:44:46.382 CET [2118443][autovacuum worker][108/6:0] LOG: process 2118443 acquired ShareUpdateExclusiveLock on relation 16436 of database 5 after 1188.860 ms
2025-11-04 16:44:46.382 CET [2118443][autovacuum worker][108/6:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:46.975 CET [2118946][autovacuum worker][110/7:0] LOG: skipping analyze of "journal" --- lock not available
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] LOG: process 2118078 still waiting for ShareUpdateExclusiveLock on relation 16436 of database 5 after 1017.402 ms
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] DETAIL: Process holding the lock: 2118443. Wait queue: 2118078, 2118946.
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] ERROR: canceling autovacuum task
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] CONTEXT: processing work entry for relation "postgres.public.brin_packdate_idx"
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] LOG: process 2118078 acquired ShareUpdateExclusiveLock on relation 16436 of database 5 after 1414.103 ms
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
==2118443== VALGRINDERROR-BEGIN
==2118443== Invalid read of size 8
==2118443== at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443== by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443== by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)
==2118443== by 0x43FA9C9: postmaster_child_launch (launch_backend.c:268)
==2118443== by 0x43FDD9E: StartChildProcess (postmaster.c:3991)
==2118443== by 0x43FE008: StartAutovacuumWorker (postmaster.c:4055)
==2118443== by 0x43FF078: process_pm_pmsignal (postmaster.c:3812)
==2118443== by 0x43FF93C: ServerLoop (postmaster.c:1706)
==2118443== by 0x4401080: PostmasterMain (postmaster.c:1403)
==2118443== by 0x432A55F: main (main.c:231)
==2118443== Address 0x10 is not stack'd, malloc'd or (recently) free'd
==2118443==
==2118443== VALGRINDERROR-END
{
<insert_a_suppression_name_here>
Memcheck:Addr8
fun:PopActiveSnapshot
fun:do_autovacuum
fun:AutoVacWorkerMain
fun:postmaster_child_launch
fun:StartChildProcess
fun:StartAutovacuumWorker
fun:process_pm_pmsignal
fun:ServerLoop
fun:PostmasterMain
fun:main
}
==2118443==
==2118443== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==2118443== Access not within mapped region at address 0x10
==2118443== at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443== by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443== by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)
==2118443== by 0x43FA9C9: postmaster_child_launch (launch_backend.c:268)
==2118443== by 0x43FDD9E: StartChildProcess (postmaster.c:3991)
==2118443== by 0x43FE008: StartAutovacuumWorker (postmaster.c:4055)
==2118443== by 0x43FF078: process_pm_pmsignal (postmaster.c:3812)
==2118443== by 0x43FF93C: ServerLoop (postmaster.c:1706)
==2118443== by 0x4401080: PostmasterMain (postmaster.c:1403)
==2118443== by 0x432A55F: main (main.c:231)
==2118443== If you believe this happened as a result of a stack
==2118443== overflow in your program's main thread (unlikely but
==2118443== possible), you can try to increase the size of the
==2118443== main thread stack using the --main-stacksize= flag.
==2118443== The main thread stack size used in this run was 8388608.
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] LOG: process 2118946 still waiting for ShareUpdateExclusiveLock on relation 16436 of database 5 after 1013.910 ms
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] DETAIL: Process holding the lock: 2118078. Wait queue: 2118946.
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:48.098 CET [2118946][autovacuum worker][110/8:0] LOG: process 2118946 acquired ShareUpdateExclusiveLock on relation 16436 of database 5 after 1111.825 ms
2025-11-04 16:44:48.098 CET [2118946][autovacuum worker][110/8:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:48.482 CET [2119181][client backend][8/2:0] LOG: statement: select count(*) from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)
where not placeholder;
2025-11-04 16:44:48.799 CET [2109989][postmaster][:0] LOG: autovacuum worker (PID 2118443) was terminated by signal 11: Segmentation fault
2025-11-04 16:44:48.799 CET [2109989][postmaster][:0] DETAIL: Failed process was running: autovacuum: BRIN summarize public.brin_packdate_idx 1

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

In response to

Re: BRIN autosummarization lacking a snapshot at 2025-11-04 11:47:47 from Álvaro Herrera

Responses

Re: BRIN autosummarization lacking a snapshot at 2025-11-04 17:22:50 from Álvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2025-11-04 16:55:28	Re: Non-text mode for pg_dumpall
Previous Message	Melanie Plageman	2025-11-04 16:48:15	Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)