Re: Btree runtime recovery. Stuck spins.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Vadim Mikheev" <vmikheev(at)sectorbase(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Btree runtime recovery. Stuck spins.
Date: 2001-02-09 18:05:18
Message-ID: 20428.981741918@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> "Vadim Mikheev" <vmikheev(at)sectorbase(dot)com> writes:
>> Btree uses spins to lock buffers (as all other access methods) and so
>> I could use only spins in new code. And though tree recovery locks buffers
>> for longer time than normal insert operations it's possible to get
>> "stuck" spins when using concurrent buffers locks *everywhere* under
>> heavy load (especially with WAL which requires holding buffer locks
>> for duration of logging).

> Hm. It was OK to use spinlocks to control buffer access when the max
> delay was just the time to read or write one disk page. But it sounds
> like we've pushed the code way past what it was designed to do. I think
> this needs some careful thought, not just a quick hack like increasing
> the timeout interval.

After thinking more about this, simply increasing S_MAX_BUSY is clearly
NOT a good answer. If you are under heavy load then processes that are
spinning are making things worse, not better, because they are sucking
CPU cycles that would be better spent on the processes that are holding
the locks.

It would not be very difficult to replace the per-disk-buffer spinlocks
with regular lockmanager locks. Advantages:
* Processes waiting for a buffer lock aren't sucking CPU cycles.
* Deadlocks will be detected and handled reasonably. (The more stuff
that WAL does while holding a buffer lock, the bigger the chances
of deadlock. I think this is a significant concern now.)
Of course the major disadvantage is:
* the lock setup/teardown overhead is much greater than for a
spinlock, and the overhead is just wasted when there's no contention.

A reasonable alternative would be to stick with the spinlock mechanism,
but use a different locking routine (maybe call it S_SLOW_LOCK) that is
designed to deal with locks that may be held for a long time. It would
use much longer delay intervals than the regular S_LOCK code, and would
have either a longer time till ultimate timeout, or no timeout at all.
The main problem with this idea is choosing an appropriate timeout
behavior. As I said, I am concerned about the possibility of deadlocks
in WAL-recovery scenarios, so I am not very happy with the thought of
no timeout at all. But it's hard to see what a reasonable timeout would
be if a minute or more isn't enough in your test cases; seems to me that
that suggests that for very large indexes, you might need a *long* time.

Comments, preferences, better ideas?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-02-09 18:12:56 Re: Btree runtime recovery. Stuck spins.
Previous Message Bruce Momjian 2001-02-09 17:56:33 Re: Open 7.1 items