Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Neha Sharma <neha(dot)sharma(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
Date: 2021-03-22 08:25:23
Message-ID: CAAJ_b968A1YPiKvh7pakjHLLbc8ZV2Mpry+OvwSFRdyk6fEe7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In heapam_relation_copy_for_cluster(), begin_heap_rewrite() sets
rwstate->rs_new_rel->rd_smgr correctly but next line tuplesort_begin_cluster()
get called which cause the system cache invalidation and due to CCA setting,
wipe out rwstate->rs_new_rel->rd_smgr which wasn't restored for the subsequent
operations and causes segmentation fault.

By calling RelationOpenSmgr() before calling smgrimmedsync() in
end_heap_rewrite() would fix the failure. Did the same in the attached patch.

Regards,
Amul

On Mon, Mar 22, 2021 at 11:53 AM Neha Sharma
<neha(dot)sharma(at)enterprisedb(dot)com> wrote:
>
> Hello,
>
> While executing the below test case server crashed with Segfault 11 on master branch.
> I have enabled the CLOBBER_CACHE_ALWAYS in src/include/pg_config_manual.h
>
> Issue is only reproducing on master branch.
>
> Test Case:
> CREATE TABLE sm_5_323_table (col1 numeric);
> CREATE INDEX sm_5_323_idx ON sm_5_323_table(col1);
>
> CLUSTER sm_5_323_table USING sm_5_323_idx;
>
> \! /PGClobber_build/postgresql/inst/bin/clusterdb -t sm_5_323_table -U edb -h localhost -p 5432 -d postgres
>
> Test case output:
> edb(at)edb:~/PGClobber_build/postgresql/inst/bin$ ./psql postgres
> psql (14devel)
> Type "help" for help.
>
> postgres=# CREATE TABLE sm_5_323_table (col1 numeric);
> CREATE TABLE
> postgres=# CREATE INDEX sm_5_323_idx ON sm_5_323_table(col1);
> CREATE INDEX
> postgres=# CLUSTER sm_5_323_table USING sm_5_323_idx;
> CLUSTER
> postgres=# \! /PGClobber_build/postgresql/inst/bin/clusterdb -t sm_5_323_table -U edb -h localhost -p 5432 -d postgres
> clusterdb: error: clustering of table "sm_5_323_table" in database "postgres" failed: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>
> Stack Trace:
> Core was generated by `postgres: edb postgres 127.0.0.1(50978) CLUSTER '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x000055e5c85ea0b4 in mdopenfork (reln=0x0, forknum=MAIN_FORKNUM, behavior=1) at md.c:485
> 485 if (reln->md_num_open_segs[forknum] > 0)
> (gdb) bt
> #0 0x000055e5c85ea0b4 in mdopenfork (reln=0x0, forknum=MAIN_FORKNUM, behavior=1) at md.c:485
> #1 0x000055e5c85eb2f0 in mdnblocks (reln=0x0, forknum=MAIN_FORKNUM) at md.c:768
> #2 0x000055e5c85eb61b in mdimmedsync (reln=0x0, forknum=forknum(at)entry=MAIN_FORKNUM) at md.c:930
> #3 0x000055e5c85ec6e5 in smgrimmedsync (reln=<optimized out>, forknum=forknum(at)entry=MAIN_FORKNUM) at smgr.c:662
> #4 0x000055e5c81ae28b in end_heap_rewrite (state=state(at)entry=0x55e5ca5d1d70) at rewriteheap.c:342
> #5 0x000055e5c81a32ea in heapam_relation_copy_for_cluster (OldHeap=0x7f212ce41ba0, NewHeap=0x7f212ce41058, OldIndex=<optimized out>, use_sort=<optimized out>, OldestXmin=<optimized out>,
> xid_cutoff=<optimized out>, multi_cutoff=0x7ffcba6ebe64, num_tuples=0x7ffcba6ebe68, tups_vacuumed=0x7ffcba6ebe70, tups_recently_dead=0x7ffcba6ebe78) at heapam_handler.c:984
> #6 0x000055e5c82f218a in table_relation_copy_for_cluster (tups_recently_dead=0x7ffcba6ebe78, tups_vacuumed=0x7ffcba6ebe70, num_tuples=0x7ffcba6ebe68, multi_cutoff=0x7ffcba6ebe64,
> xid_cutoff=0x7ffcba6ebe60, OldestXmin=<optimized out>, use_sort=<optimized out>, OldIndex=0x7f212ce40670, NewTable=0x7f212ce41058, OldTable=0x7f212ce41ba0)
> at ../../../src/include/access/tableam.h:1656
> #7 copy_table_data (pCutoffMulti=<synthetic pointer>, pFreezeXid=<synthetic pointer>, pSwapToastByContent=<synthetic pointer>, verbose=<optimized out>, OIDOldIndex=<optimized out>,
> OIDOldHeap=16384, OIDNewHeap=<optimized out>) at cluster.c:908
> #8 rebuild_relation (verbose=<optimized out>, indexOid=<optimized out>, OldHeap=<optimized out>) at cluster.c:604
> #9 cluster_rel (tableOid=<optimized out>, indexOid=<optimized out>, params=<optimized out>) at cluster.c:427
> #10 0x000055e5c82f2b7f in cluster (pstate=pstate(at)entry=0x55e5ca5315c0, stmt=stmt(at)entry=0x55e5ca510368, isTopLevel=isTopLevel(at)entry=true) at cluster.c:195
> #11 0x000055e5c85fcbc6 in standard_ProcessUtility (pstmt=0x55e5ca510430, queryString=0x55e5ca50f850 "CLUSTER public.sm_5_323_table;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
> queryEnv=0x0, dest=0x55e5ca510710, qc=0x7ffcba6ec340) at utility.c:822
> #12 0x000055e5c85fd436 in ProcessUtility (pstmt=pstmt(at)entry=0x55e5ca510430, queryString=<optimized out>, context=context(at)entry=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>,
> queryEnv=<optimized out>, dest=dest(at)entry=0x55e5ca510710, qc=0x7ffcba6ec340) at utility.c:525
> #13 0x000055e5c85f6148 in PortalRunUtility (portal=portal(at)entry=0x55e5ca570d70, pstmt=pstmt(at)entry=0x55e5ca510430, isTopLevel=isTopLevel(at)entry=true,
> setHoldSnapshot=setHoldSnapshot(at)entry=false, dest=dest(at)entry=0x55e5ca510710, qc=qc(at)entry=0x7ffcba6ec340) at pquery.c:1159
> #14 0x000055e5c85f71a4 in PortalRunMulti (portal=portal(at)entry=0x55e5ca570d70, isTopLevel=isTopLevel(at)entry=true, setHoldSnapshot=setHoldSnapshot(at)entry=false,
> dest=dest(at)entry=0x55e5ca510710, altdest=altdest(at)entry=0x55e5ca510710, qc=qc(at)entry=0x7ffcba6ec340) at pquery.c:1305
> #15 0x000055e5c85f8823 in PortalRun (portal=portal(at)entry=0x55e5ca570d70, count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true, run_once=run_once(at)entry=true,
> dest=dest(at)entry=0x55e5ca510710, altdest=altdest(at)entry=0x55e5ca510710, qc=0x7ffcba6ec340) at pquery.c:779
> #16 0x000055e5c85f389e in exec_simple_query (query_string=0x55e5ca50f850 "CLUSTER public.sm_5_323_table;") at postgres.c:1185
> #17 0x000055e5c85f51cf in PostgresMain (argc=argc(at)entry=1, argv=argv(at)entry=0x7ffcba6ec670, dbname=<optimized out>, username=<optimized out>) at postgres.c:4415
> #18 0x000055e5c8522240 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4470
> #19 BackendStartup (port=<optimized out>) at postmaster.c:4192
> #20 ServerLoop () at postmaster.c:1737
> #21 0x000055e5c85237ec in PostmasterMain (argc=<optimized out>, argv=0x55e5ca508fe0) at postmaster.c:1409
> #22 0x000055e5c811a2cf in main (argc=3, argv=0x55e5ca508fe0) at main.c:209
>
> Thanks.
> --
> Regards,
> Neha Sharma

Attachment Content-Type Size
fix_failure_for_cca.patch application/x-patch 505 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-03-22 08:33:33 Re: problem with RETURNING and update row movement
Previous Message Kyotaro Horiguchi 2021-03-22 07:59:15 Re: PG13 fails to startup even though the current transaction is equal to the target transaction