| From: | Álvaro Herrera <alvherre(at)kurilemu(dot)de> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: BRIN autosummarization lacking a snapshot |
| Date: | 2025-11-04 16:49:31 |
| Message-ID: | 202511041648.nofajnuddmwk@alvherre.pgsql |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 2025-Nov-04, Álvaro Herrera wrote:
> With my initial try of this test, just counting the number of BRIN
> tuples, I was _really_ surprised that the index did indeed contain the
> expected number of tuples, even when the error was being thrown. This
> turned out to be expected, because the way BRIN summarization works is
> that we insert a placeholder tuple first, then update it to the correct
> value, and the error only aborts the second part. That's why I needed
> to add a WHERE clause to only count non-placeholder tuples.
I see that skink (buildfarm animal that runs under valgrind) has failed.
Gotta vacate the premises, will study later.
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] LOG: process 2118443 still waiting for ShareUpdateExclusiveLock on relation 16436 of database 5 after 1016.528 ms
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] DETAIL: Process holding the lock: 2118078. Wait queue: 2118443.
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:46.298 CET [2118078][autovacuum worker][103/9:766] ERROR: canceling autovacuum task
2025-11-04 16:44:46.298 CET [2118078][autovacuum worker][103/9:766] CONTEXT: automatic analyze of table "postgres.public.journal"
2025-11-04 16:44:46.382 CET [2118443][autovacuum worker][108/6:0] LOG: process 2118443 acquired ShareUpdateExclusiveLock on relation 16436 of database 5 after 1188.860 ms
2025-11-04 16:44:46.382 CET [2118443][autovacuum worker][108/6:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:46.975 CET [2118946][autovacuum worker][110/7:0] LOG: skipping analyze of "journal" --- lock not available
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] LOG: process 2118078 still waiting for ShareUpdateExclusiveLock on relation 16436 of database 5 after 1017.402 ms
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] DETAIL: Process holding the lock: 2118443. Wait queue: 2118078, 2118946.
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] ERROR: canceling autovacuum task
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] CONTEXT: processing work entry for relation "postgres.public.brin_packdate_idx"
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] LOG: process 2118078 acquired ShareUpdateExclusiveLock on relation 16436 of database 5 after 1414.103 ms
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
==2118443== VALGRINDERROR-BEGIN
==2118443== Invalid read of size 8
==2118443== at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443== by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443== by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)
==2118443== by 0x43FA9C9: postmaster_child_launch (launch_backend.c:268)
==2118443== by 0x43FDD9E: StartChildProcess (postmaster.c:3991)
==2118443== by 0x43FE008: StartAutovacuumWorker (postmaster.c:4055)
==2118443== by 0x43FF078: process_pm_pmsignal (postmaster.c:3812)
==2118443== by 0x43FF93C: ServerLoop (postmaster.c:1706)
==2118443== by 0x4401080: PostmasterMain (postmaster.c:1403)
==2118443== by 0x432A55F: main (main.c:231)
==2118443== Address 0x10 is not stack'd, malloc'd or (recently) free'd
==2118443==
==2118443== VALGRINDERROR-END
{
<insert_a_suppression_name_here>
Memcheck:Addr8
fun:PopActiveSnapshot
fun:do_autovacuum
fun:AutoVacWorkerMain
fun:postmaster_child_launch
fun:StartChildProcess
fun:StartAutovacuumWorker
fun:process_pm_pmsignal
fun:ServerLoop
fun:PostmasterMain
fun:main
}
==2118443==
==2118443== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==2118443== Access not within mapped region at address 0x10
==2118443== at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443== by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443== by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)
==2118443== by 0x43FA9C9: postmaster_child_launch (launch_backend.c:268)
==2118443== by 0x43FDD9E: StartChildProcess (postmaster.c:3991)
==2118443== by 0x43FE008: StartAutovacuumWorker (postmaster.c:4055)
==2118443== by 0x43FF078: process_pm_pmsignal (postmaster.c:3812)
==2118443== by 0x43FF93C: ServerLoop (postmaster.c:1706)
==2118443== by 0x4401080: PostmasterMain (postmaster.c:1403)
==2118443== by 0x432A55F: main (main.c:231)
==2118443== If you believe this happened as a result of a stack
==2118443== overflow in your program's main thread (unlikely but
==2118443== possible), you can try to increase the size of the
==2118443== main thread stack using the --main-stacksize= flag.
==2118443== The main thread stack size used in this run was 8388608.
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] LOG: process 2118946 still waiting for ShareUpdateExclusiveLock on relation 16436 of database 5 after 1013.910 ms
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] DETAIL: Process holding the lock: 2118078. Wait queue: 2118946.
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:48.098 CET [2118946][autovacuum worker][110/8:0] LOG: process 2118946 acquired ShareUpdateExclusiveLock on relation 16436 of database 5 after 1111.825 ms
2025-11-04 16:44:48.098 CET [2118946][autovacuum worker][110/8:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:48.482 CET [2119181][client backend][8/2:0] LOG: statement: select count(*) from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)
where not placeholder;
2025-11-04 16:44:48.799 CET [2109989][postmaster][:0] LOG: autovacuum worker (PID 2118443) was terminated by signal 11: Segmentation fault
2025-11-04 16:44:48.799 CET [2109989][postmaster][:0] DETAIL: Failed process was running: autovacuum: BRIN summarize public.brin_packdate_idx 1
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Dunstan | 2025-11-04 16:55:28 | Re: Non-text mode for pg_dumpall |
| Previous Message | Melanie Plageman | 2025-11-04 16:48:15 | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |