Re: Hung Vacuum in 8.3

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Hung Vacuum in 8.3
Date: 2011-02-22 11:26:17
Message-ID: AANLkTiko6Sv5Z3bnOdxmTXixNcXmK9qbSq7fWyxAp9Gk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Feb 22, 2011 at 7:14 AM, Mark Kirkwood
<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
> On 22/02/11 19:47, Heikki Linnakangas wrote:
>>
>> A long query on the same table can block vacuum. Vacuum needs to take a
>> so-called "cleanup lock" on each page, which means that it has to wait until
>> no other backend holds a pin on the page. A long-running query can keep a
>> page pinned for a long time.
>>
>
> Ah, - this is LockBufferForCleanup? So we are waiting for an exclusive
> LockBuffer operation i.e a LWLock not a (heavyweight) lock, urg... no
> wonder.

Actually it's not waiting for the LockBuffer LWLock. it's waiting
until your query unpins the buffer it wants. Vacuum tries to get an
exclusive lock on the buffer, if it gets it then it checks if anyone
is using that buffer. If someone is then it unlocks the buffer and
waits until nobody has it pinned.

Only certain plan types will keep a buffer pinned for a long time.
Things like the outer table of a nested loop join where the inner side
is another large table or slow subquery for example. This isn't
terribly common, usually it's caused by a large mistaken cartesian
join or something. but occasionally Postgres will generate a plan that
could do it. For instance joining a single row against a large table
will sometimes do a nested loop from the single row to the large
table. It's also possible Vacuum has been making progress but the
query keeps getting in its way and stopping it on new blocks.

It's also possible there's a bug of course. If someone was using that
buffer and somehow failed to notify the vacuum that they were done it
would wait for a very long time (forever?). However if vacuum
eventually continued when the query was canceled then it seems likely
it was working as intended.

--
greg

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Greg Stark 2011-02-22 11:30:52 Re: 8.3.5: Types with typnamespace pointing at non-existent pg_namespace oid
Previous Message Daniel Farina 2011-02-22 08:48:57 Re: 8.3.5: Types with typnamespace pointing at non-existent pg_namespace oid