Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Andreas Seltenreich <seltenreich(at)gmx(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint
Date: 2017-03-27 09:39:25
Message-ID: CAE9k0P=V2LhtyeMXd295fhisp=NWUhRVJ9EZQCDowWiY9rSohQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> testing with master as of cf366e97ff, sqlsmith occasionally triggers the
> following assertion:
>
> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))", File: "bufmgr.c", Line: 3397)
>
> Backtraces always look like the one below. It is reproducible on a
> cluster once it happens. I could provide a tarball if needed.
>
> regards,
> Andreas
>
> #2 0x00000000008324b1 in ExceptionalCondition (conditionName=conditionName(at)entry=0x9e4e28 "!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))", errorType=errorType(at)entry=0x87b03d "FailedAssertion", fileName=fileName(at)entry=0x9e5856 "bufmgr.c", lineNumber=lineNumber(at)entry=3397) at assert.c:54
> #3 0x0000000000706971 in MarkBufferDirtyHint (buffer=2844, buffer_std=buffer_std(at)entry=1 '\001') at bufmgr.c:3397
> #4 0x00000000004b3ecd in _hash_kill_items (scan=scan(at)entry=0x66dcf70) at hashutil.c:514
> #5 0x00000000004a9c1b in hashendscan (scan=0x66dcf70) at hash.c:512
> #6 0x00000000004cf17a in index_endscan (scan=0x66dcf70) at indexam.c:353
> #7 0x000000000061fa51 in ExecEndIndexScan (node=0x3093f30) at nodeIndexscan.c:852
> #8 0x0000000000608e59 in ExecEndNode (node=<optimized out>) at execProcnode.c:715
> #9 0x00000000006045b8 in ExecEndPlan (estate=0x3064000, planstate=<optimized out>) at execMain.c:1540
> #10 standard_ExecutorEnd (queryDesc=0x30cb880) at execMain.c:487
> #11 0x00000000005c87b0 in PortalCleanup (portal=0x1a60060) at portalcmds.c:302
> #12 0x000000000085cbb3 in PortalDrop (portal=0x1a60060, isTopCommit=<optimized out>) at portalmem.c:489
> #13 0x0000000000736ed2 in exec_simple_query (query_string=0x315b7a0 "...") at postgres.c:1111
> #14 0x0000000000738b51 in PostgresMain (argc=<optimized out>, argv=argv(at)entry=0x1a6c6c8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4071
> #15 0x0000000000475fef in BackendRun (port=0x1a65b90) at postmaster.c:4317
> #16 BackendStartup (port=0x1a65b90) at postmaster.c:3989
> #17 ServerLoop () at postmaster.c:1729
> #18 0x00000000006c8662 in PostmasterMain (argc=argc(at)entry=4, argv=argv(at)entry=0x1a3f540) at postmaster.c:1337
> #19 0x000000000047729d in main (argc=4, argv=0x1a3f540) at main.c:228
>

Hi,

Thanks for reporting this problem. Could you please let me know on for
how long did you run sqlsmith to get this crash. However, I have found
the reason for this crash. This is basically happening when trying to
retrieve the tuples using cursor. Basically the current hash index
scan work tuple-at-a-time which means once it finds tuple on page, it
releases lock from the page but keeps pin on it and finally returns
the tuple. When the requested number of tuples are processed there is
no lock on the page that was being scanned but yes there is a pin on
it. Finally, when trying to close a cursor at the end of scan, if any
killed tuples has been identified we try to first mark these items as
dead with the help of _hash_kill_items(). But, since we only have pin
on this page, the assert check 'LWLockHeldByMe()' fails.

When scanning tuples using normal SELECT * statement, before moving to
next page in a bucket we first deal with all the killed items but we
do this without releasing lock and pin on the current page. Hence,
with SELECT queries this crash is not visible.

The attached patch fixes this. But, please note that all these changes
will get removed with the patch for page scan mode - [1].

[1] - https://www.postgresql.org/message-id/CA%2BTgmobYTvexcjqMhXoNCyEUHChzmdC_2xVGgj7eqaYVgoJA%2Bg%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachment Content-Type Size
reacquire_lock_hashkillitems_if_required.patch application/x-patch 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stas Kelvich 2017-03-27 09:53:01 Re: logical decoding of two-phase transactions
Previous Message Okano, Naoki 2017-03-27 09:28:59 Re: Adding the optional clause 'AS' in CREATE TRIGGER