Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Wildpaner <mike(at)rainbow(dot)studorg(dot)tuwien(dot)ac(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 'COPY ... FROM' inserts to btree, blocks on buffer writeout
Date: 2004-12-31 19:52:33
Message-ID: 11701.1104522753@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Wildpaner <mike(at)rainbow(dot)studorg(dot)tuwien(dot)ac(dot)at> writes:
> two concurrent clients try to 'COPY ... FROM ...' to the same table,
> "feature_link".

> The second one (pid 17983) is waiting for an ExclusiveLock on the table's
> primary key index, "key__idpk__flink".

> The first one (pid 17980) is inserting into the index's btree, waiting
> for a buffer lock. This locking state seems to persist.

After staring at this for a little bit I have a theory. The stack trace
for 17980 indicates that it is trying to flush a dirty buffer so that it
can take over the buffer for a new index page. It's blocked trying to
acquire a shared buffer lock on that buffer, which means that someone
else must have an exclusive buffer lock, which the code is not expecting
because the buffer just came off the free list and therefore was not
pinned by anyone.

However ... FlushBuffer releases the BufMgrLock before trying to acquire
the per-buffer lock. If 17980 lost its time slice during that interval,
it'd be possible for someone else to come in and re-pin the chosen
victim page and then lock the buffer before 17980 could.

Normally this wouldn't be any big problem, but if the someone else later
blocked on some lock held by 17980, you'd have a deadlock. I think
that's exactly what we're seeing here. The victim buffer page must be
the same one that 17983 is trying to split; so it's got exclusive lock
(and pin) on that page, and is now stuck waiting for the lock that would
give it the privilege to extend the index.

A possible fix for this is to reorder the operations in FlushBuffer
so that we share-lock the buffer before releasing BufMgrLock ... but
I'm not sure that doesn't introduce other deadlock risks. It needs
some thought.

If this is the correct explanation, the bug has been around for a good
while; but it's got to be a very low-probability scenario. Congrats
to Michael for taking the time to dig into it when he saw it.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message lsunley 2004-12-31 19:55:26 Re: sysv_shmem potential problem
Previous Message Andrew Dunstan 2004-12-31 19:51:48 Re: contrib regression on old versions