Re: Postgres Query Plan Live Lock

From: "Pweaver (Paul Weaver)" <pweaver(at)panjiva(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Michael Vezza <michael(at)panjiva(dot)com>
Subject: Re: Postgres Query Plan Live Lock
Date: 2014-02-05 19:47:53
Message-ID: CAFTGa=kU2aTRQVdW2Wwg4XUYxzU0_7LCWEYfhGBAm3n5+7Yyyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Feb 5, 2014 at 9:52 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Monday, February 3, 2014, Pweaver (Paul Weaver) <pweaver(at)panjiva(dot)com>
> wrote:
>
>> We have been running into a (live lock?) issue on our production Postgres
>> instance causing queries referencing a particular table to become extremely
>> slow and our application to lock up.
>>
>> This tends to occur on a particular table that gets a lot of queries
>> against it after a large number of deletes. When this happens, the
>> following symptoms occur when queries referencing that table are run (even
>> it we stop the deleting):
>>
>
> What do you mean by "stop the deleting"? Are you pausing the delete but
> without either committing or rolling back the transaction, but just holding
> it open? Are you stopping it cleanly, between transactions?
>

We are repeatedly running delete commands in their own transactions. We
stop issuing new deletes and let them finish cleanly.

>
> Also, how many queries are happening concurrently? Perhaps you need a
> connection pooler.
>
Usually between 1 and 20. When it gets locked up closer to 100-200.
We should add a connection pooler. Would the number of active queries on
the table be causing the issue?

>
> Is the CPU time user time or system time? What kernel version do you have?
>
Real time - 3.2.0-26

>
>
>> SELECT * FROM table_name LIMIT 10; -- takes ~45 seconds to complete
>> EXPLAIN SELECT * FROM table_name LIMIT 10; -- takes ~45 seconds to
>> complete the explain query, the query plan looks reasonable
>>
>
> This sounds like the problem we heard quite a bit about recently, where
> processes spend a lot of time fighting over the proclock while they try to
> check the commit status of open transactions while. But I don't see how
> deletes could trigger that behavior. If the delete has not committed, the
> tuples are still visible and the LIMIT 10 is quickly satisfied. If the
> delete has committed, the tuples quickly get hinted, and so the next query
> along should be faster.
>
> I also don't see why the explain would be slow. A similar problem was
> tracked down to digging through in-doubt tuples while trying to use an
> index to find the true the min or max during estimating the cost of a merge
> join. But I don't think a simple table query should lead to that, unless
> table_name is a view. And I don't see how deletes, rather than uncommitted
> inserts, could trigger it either.
>
>
> max_connections | 600 |
>> configuration file
>>
>
> That is quite extreme. If a temporary load spike (like from the deletes
> and the hinting needed after them) slows down the select queries and you
> start more and more of them, soon you could tip the system over into kernel
> scheduler insanity with high system time. Once in this mode, it will stay
> there until the incoming stream of queries stops and the existing ones
> clear out. But, if that is what is occurring, I don't know why queries on
> other tables would still be fast.
>
We probably want a connection pooler anyways, but in this particular case,
the load average is fairly low on the machine running Postrgres.

>
> Cheers,
>
> Jeff
>
>>

--
Thank You,
Pweaver (pweaver(at)panjiva(dot)com)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2014-02-05 21:19:20 Re: [PERFORM] encouraging index-only scans
Previous Message Pweaver (Paul Weaver) 2014-02-05 19:36:47 Re: Postgres Query Plan Live Lock