Re: Puzzling full database lock

From: Christopher Opena <counterveil(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Puzzling full database lock
Date: 2012-02-03 20:55:37
Message-ID: CAFOrgqf8ipN-bRNURhOWSm33v12mTdzbR0w0z6g626nRjcPJcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Merlin, thanks for the response. My comments below, but firstly, does
anyone know if autovacuum is affected by setting a statement_timeout?
There was a long thread here from 2007'ish:

http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/80044/focus=93847

But it's unclear to me which way that ended up going. We're thinking of
setting statement_timeout to something fairy high (that will catch these
queries we've been seeing) so that we can further troubleshoot over time.
We're worried, however, that autovacuum might be affected.

Random thoughts/suggestions:
> *) Look for some correlation between non-idle process count and
> locking situation. You are running a lot of processes and if I was in
> your shoes I would be strongly looking at pgbouncer to handle
> connection pooling. You could be binding in the database or (worse)
> the kernel
>

We're definitely looking at our options with pgbouncer right now; issue
being that we'd have to have a bouncer per database, and our architecture
right now calls for "many databases, many connections" so we're trying to
limit that by having our application pooler limit the amount of active
connections one can have to the application itself (and thereby to the
database, by proxy). This is still an option, however, so we're doing some
research here.

> *) Try logging checkpoints to see if there is any correlation with your
> locks.
>

We've been logging checkpoints for several days now with no hard
correlation that we can find. Thanks for the suggestion though!

> *) An strace of both the 'high cpu' process and one of the blocked
> process might give some clues -- in particular if you are being
> blocked on a system call
>

We have yet to try this; definitely next in line.

> *) Given enough time, do your high cpu queries ever complete? Are they
> writing or reading?
>

The queries are reading in this case; we haven't allowed them to run their
course because of the effect it has on our entire user base. Right now
we've patched our application to catch these cases and handle them outright
by notifying the end user that there is a potentially damaging query that
is being cancelled. Short term solution, but for now it's something we
need to do until we can replicate and solve the problem on a non-production
system.

> *) What happens to overall system load if you lower shared_buffers to,
> say, 4gb?
>

We're going to be trying this as well once we have an appropriate
maintenance window. It seems to be a general consensus that this is
something we should at least try.

Thanks,
-Chris.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gary Chambers 2012-02-03 21:27:53 Warning: you don't own a lock of type ExclusiveLock
Previous Message John R Pierce 2012-02-03 18:18:33 Re: restart server on Lion