Re: odd intermittent query hanging issue

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Aaron Burnett <aburnett(at)bzzagent(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: odd intermittent query hanging issue
Date: 2012-05-21 16:08:01
Message-ID: 4FBA6861.7080200@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 05/20/2012 07:45 AM, Aaron Burnett wrote:
> Hey Steve,
>
> Just wanted to send a note of thanks for pointing me in a few new
> directions on this.
>
> Turns out that the query would swap but not all the time. When it swapped,
> it wouldn't finish, if it didn't swap it would finish in the expected
> time. Not sure why this isn't consistent...
>
> The work_mem was set at 100MB so as a test I increased it to 200MB on the
> session and so far it has consistently finished every time it has run. I'm
> not sure why this has reared it's head now as we run far more complex and
> data intensive queries than this one and they don't swap... so I guess I
> have some forensics to do.

You're welcome. Assuming that changing work_mem is a confirmed solution,
it sounds like the query is running right on the edge of running out of
work_mem and when nudged over that threshold starts to swap. PostgreSQL,
like most other things in computers, does not degrade linearly. Things
work great till you run out of some resource then degrade dramatically.

The work_mem setting specifies memory available for internal sorts and
hash operations. Looking at the original explain plan you posted I see
lots of such operations - perhaps your other queries don't need them:

explain plan:
------------------------------
Unique (cost=237234.66..239267.33 rows=406533 width=4) (actual
time=4790.823..4922.621 rows=418843 loops=1)
-> Sort (cost=237234.66..238251.00 rows=406533 width=4) (actual
time=4790.818..4826.201 rows=418879 loops=1)
Sort Key: member.id
Sort Method: quicksort Memory: 31923kB
-> Hash Join (cost=167890.31..199359.99 rows=406533 width=4)
(actual time=3357.406..4532.952 rows=418879 loops=1)
Hash Cond: (xrcm.member_id = member.id)
-> Seq Scan on xrefchannelmember xrcm
(cost=0.00..19273.69 rows=813066 width=4) (actual time=0.015..219.259 row
s=814421 loops=1)
Filter: (channel_id = 1)
-> Hash (cost=162586.21..162586.21 rows=424328 width=4)
(actual time=3357.001..3357.001 rows=444626 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 15632kB
-> Seq Scan on member (cost=66114.02..162586.21
rows=424328 width=4) (actual time=2357.280..3216.076 rows
=444626 loops=1)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Seq Scan on xrefcampaignmember
(cost=0.00..57931.82 rows=3272882 width=4) (actual time=0.021..
670.086 rows=3272870 loops=1)
Total runtime: 4963.134 ms

I'm still bothered by a couple things but I don't have a enough
understanding of internals to provide definitive advice:

1. Why does reindexing seem to temporarily solve the problem?...

2...especially when I don't see indexes being used in the query

I would definitely run explain on some of the alternative versions of
the query posted earlier to see if they do a better job of using
available indexes or reducing the amount of internal hashing/sorting. If
they do, you will have some "ammo" to take back to your developer.

Cheers,
Steve

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2012-05-21 16:14:03 Re: Libpq question
Previous Message Cédric Villemain 2012-05-21 15:12:08 Re: Global Named Prepared Statements