HEAD crashes with assertion and LWLOCK_STATS enabled

From: Yuto HAYAMIZU <y(dot)hayamizu(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: HEAD crashes with assertion and LWLOCK_STATS enabled
Date: 2014-05-20 08:02:54
Message-ID: 537B0C2E.6090706@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I found a bug that causes a crash when assertion is enabled and LWLOCK_STATS is defined.
I've tested with Debian 7.5 (3.2.0-4-amd64) on VMware fusion 6, but this bug seems to be platform-independent and should reproduce in other environments.
A patch to fix the bug is also attached.

## Reproduing a crash

You can reproduce a crash by this way:

git co a0841ecd2518d4505b96132b764b918ab5d21ad4
git clean -dfx
./configure --enable-cassert CFLAGS='-DLWLOCK_STATS'
make check

In my environment, the following messages appeared.

( omit... )
../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check --top-builddir=../../.. --dlpath=. --schedule=./parallel_schedule
============== creating temporary installation ==============
============== initializing database system ==============

pg_regress: initdb failed

and initdb.log contained the following messages.

reating directory /tmp/pghead/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /tmp/pghead/src/test/regress/./tmp_check/data/base/1 ... PID 48239 lwlock main 142: shacq 0 exacq 1 blk 0 spindelay 0
( omit... )
PID 48247 lwlock main 33058: shacq 0 exacq 1 blk 0 spindelay 0
PID 48247 lwlock main 33005: shacq 0 exacq 48 blk 0 spindelay 0
ok
loading system objects' descriptions ... TRAP: FailedAssertion("!(CritSectionCount == 0 || (context) == ErrorContext || (MyAuxProcType == CheckpointerProcess))", File: "mcxt.c", Line: 594)
Aborted (core dumped)
child process exited with exit code 134
initdb: data directory "/tmp/pghead/src/test/regress/./tmp_check/data" not removed at user's request

## The cause of crash

The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit 4a170ee9 on 2014-04-04.

In my understanding, the root cause of the assertion failure is on-demand allocation of lwlock_stats entry. For each LWLock, a lwlock_stats entry is created at the first invocation of LWLockAcquire using MemoryContextAlloc. If the first invocation is in a critical section, the assertion fails.

For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function was the problem.
I also confirmed the assertion failure on starting postgres on a correctly initialized database. In this case, locking CheckpointerCommLock in AbsorbFsyncRequests function was the problem.

## A solution

In order to avoid memory allocation during critical sections, lwlock_stats hash table should be populated at the initialization of each process.
The attached patch populate lwlock_stats entries of MainLWLockArray at the end of CreateLWLocks, InitProcess and InitAuxiliaryProcess.

With this patch, all regression tests can be passed so far, but I think this patch is not perfect because it does not cover LWLocks outside of MainLWLockArray. I'm not sure where is the right place to initialize lwlock_stats entries for that locks. So I feel it needs some refinements by you hackers.

Attachment Content-Type Size
0001-pre-populating-lwlock_stats-entries.patch text/plain 2.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-05-20 09:44:38 Re: Allowing join removals for more join types
Previous Message Rohit Goyal 2014-05-20 08:01:35 Re: Error in running DBT2