Postgres 7.4.7 hang in async_notify

From: pgsql-bugs(at)counterstorm(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: Postgres 7.4.7 hang in async_notify
Date: 2005-06-02 17:47:02
Message-ID: 200506021747.j52Hl2aj027031@startide.sysdetect.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


We saw the problem with async_notify again (See thread with subject
"Postgres 7.4.6 hang in async_notify" in first message to this list
dated "Mon, 25 Apr 2005 15:42:35 -0400") in a production setting.
Since our last instance, we converted to compiling postgres with
debugging, so we have a stack trace. Looking at it, the problem
appears at first blush like it might be pretty obvious: an ill-timed
signal which arrives during a malloc while malloc has some
data-structure locked, and one of the extensive operations that
Async_NotifyHandler did probably involved getting the same lock. That
doesn't explain everything, since the stack trace looks like it was
munged a little bit and it is not completely obvious to me how we got
from tag_hash to posix_memalign.

For the record, while this postgres should be (of two) generating
notifies out of triggers, we do not believe it should be listening for
any, and indeed examination of pg_listener suggests it does not. If
it matters, the notify should probably have been on the "wwww_aaaaa"
relname.

This was on a 64bit Opteron multiprocessor running Fedora Core 3. As
you see from the stack trace, we have SSL enabled for the connection
from the postgres client.

We have a coredump, binary, and associated build tree if anyone wants
it.

-Seth Robertson
----------------------------------------------------------------------
sysd=> select * from pg_locks;
relation | database | transaction | pid | mode | granted
----------+----------+-------------+-------+-----------------+---------
| | 1452163 | 5043 | ExclusiveLock | t
| | 1408041 | 29980 | ExclusiveLock | t
16759 | 17212 | | 5043 | AccessShareLock | t
----------------------------------------------------------------------

5043 is the above psql. 29980 is the postgres stuck in async_notify:

----------------------------------------------------------------------
sysd=> select * from pg_listener ;
relname | listenerpid | notification
------------------------------+-------------+--------------
kludge_28041 | 28055 | 0
antura_sysd_jobs | 25754 | 0
antura_sysd_files | 25754 | 0
wwww_ssssssss_eeeeeeee | 30219 | 0
sensor_rrrrrrrr_policy | 30219 | 0
kludge_30217 | 30219 | 0
antura_sysd_job_cron_actions | 25754 | 0
wwww_aaaaa | 30219 | 0
Pending | 28055 | 0
(9 rows)
----------------------------------------------------------------------

----------------------------------------------------------------------
29980 ? S 158:06 postgres: sysd sysd 127.0.0.1 async_notify
----------------------------------------------------------------------

----------------------------------------------------------------------
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1".

Attaching to program: /usr/counterstorm/postgres/bin/postgres, process 29980
Reading symbols from /lib64/libssl.so.4...done.
Loaded symbols for /lib64/libssl.so.4
Reading symbols from /lib64/libcrypto.so.4...done.
Loaded symbols for /lib64/libcrypto.so.4
Reading symbols from /usr/lib64/libz.so.1...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /usr/lib64/libreadline.so.4...done.
Loaded symbols for /usr/lib64/libreadline.so.4
Reading symbols from /lib64/libtermcap.so.2...done.
Loaded symbols for /lib64/libtermcap.so.2
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/tls/libm.so.6...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /usr/lib64/libgssapi_krb5.so.2...done.
Loaded symbols for /usr/lib64/libgssapi_krb5.so.2
Reading symbols from /usr/lib64/libkrb5.so.3...done.
Loaded symbols for /usr/lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2...done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /usr/lib64/libk5crypto.so.3...done.
Loaded symbols for /usr/lib64/libk5crypto.so.3
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /usr/counterstorm/postgres/lib/plpgsql.so...done.
Loaded symbols for /usr/counterstorm/postgres/lib/plpgsql.so
Reading symbols from /usr/counterstorm/postgres/lib/array_sz.so...done.
Loaded symbols for /usr/counterstorm/postgres/lib/array_sz.so
Reading symbols from /usr/counterstorm/postgres/lib/hostinfo.so...done.
Loaded symbols for /usr/counterstorm/postgres/lib/hostinfo.so
Reading symbols from /usr/counterstorm/postgres/lib/pending.so...done.
Loaded symbols for /usr/counterstorm/postgres/lib/pending.so
0x00000036dacd09ab in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6
(gdb) where
#0 0x00000036dacd09ab in __lll_mutex_lock_wait () from /lib64/tls/libc.so.6
#1 0x00000000185d6840 in ?? ()
#2 0x000000000000000d in ?? ()
#3 0x00000036dac6d181 in posix_memalign () from /lib64/tls/libc.so.6
#4 0x0000000000000001 in ?? ()
#5 0x00000000006cd449 in tag_hash (key=0x36dae2e620, keysize=0) at hashfn.c:40
#6 0x00000000006cd076 in hash_search (hashp=0x2aa221da00, keyPtr=0x7fbfffc6a0, action=20, foundPtr=0x6ccf5c "\205u'A#t$\034\230fff\220ff\220H\213-6\"") at dynahash.c:647
#7 0x00000000005ce55e in mdclose (reln=0x1e0) at md.c:384
#8 0x00000000005d0eec in smgrclose (which=-6624, reln=0x2aa221da00) at smgr.c:292
#9 0x00000000006b9dcc in RelationIdInvalidateRelationCacheByRelationId (relationId=2002738) at relcache.c:1748
#10 0x00000000006b5628 in LocalExecuteInvalidationMessage (msg=0x7fbfffc6e0) at inval.c:440
#11 0x00000000005c0388 in ReceiveSharedInvalidMessages (invalFunction=0x6b5550 <LocalExecuteInvalidationMessage>, resetFunction=0x6b5670 <InvalidateSystemCaches>) at sinval.c:125
#12 0x00000000004684d6 in StartTransactionCommand () at xact.c:446
#13 0x00000000004d3a84 in Async_NotifyHandler (postgres_signal_arg=-622664160) at async.c:814
#14 <signal handler called>
#15 0x00000036dac69094 in _int_malloc () from /lib64/tls/libc.so.6
#16 0x00000036dac6ab12 in malloc () from /lib64/tls/libc.so.6
#17 0x00000036dcb4d05b in CRYPTO_malloc () from /lib64/libcrypto.so.4
#18 0x00000036dcb99dc1 in EVP_MD_CTX_copy_ex () from /lib64/libcrypto.so.4
#19 0x00000036dcb585cc in HMAC_Final () from /lib64/libcrypto.so.4
#20 0x00000036db91fe10 in tls1_mac () from /lib64/libssl.so.4
#21 0x00000036db91bcbf in ssl3_read_bytes () from /lib64/libssl.so.4
#22 0x00000036db919a9f in ssl3_renegotiate_check () from /lib64/libssl.so.4
#23 0x000000000053492c in secure_read (port=0x929e80, ptr=0x8a7740, len=8192) at be-secure.c:261
#24 0x000000000053ba47 in pq_getbyte () at pqcomm.c:685
#25 0x00000000005d759d in PostgresMain (argc=9992288, argv=0x52, username=0x7fbfffd660 "") at postgres.c:275
#26 0x00000000005911e4 in PostmasterMain (argc=9470864, argv=0x924a80) at postmaster.c:2561
#27 0x000000000053e0ef in main (argc=4, argv=0x7fbffff6b8) at main.c:214
----------------------------------------------------------------------

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2005-06-02 18:21:25 Re: Postgres 7.4.7 hang in async_notify
Previous Message Tom Lane 2005-06-02 15:46:28 Re: More graceful handling of invalid log_*_stats conf in 7.4