From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: hung backends stuck in spinlock heavy endless loop |
Date: | 2015-01-14 15:30:23 |
Message-ID: | 20150114153023.GS5245@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015-01-14 09:22:45 -0600, Merlin Moncure wrote:
> On Wed, Jan 14, 2015 at 9:11 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2015-01-14 10:05:01 -0500, Tom Lane wrote:
> >> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> >> > On Wed, Jan 14, 2015 at 8:41 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> >> What are the autovac processes doing (according to pg_stat_activity)?
> >>
> >> > pid,running,waiting,query
> >> > 7105,00:28:40.789221,f,autovacuum: VACUUM ANALYZE pg_catalog.pg_class
> >
> > It'd be interesting to know whether that vacuum gets very frequent
> > semaphore wakeups. Could you strace it for a second or three?
>
> for 30 seconds+ it just looks like this:
> mmoncure(at)mernix2 ~ $ sudo strace -p 7105
> Process 7105 attached
> semop(5701638, {{4, -1, 0}}, 1
Ok. So that explains why it's not interruptible.
> all of other processes are yielding out of the spinlock, for example:
> select(0, NULL, NULL, NULL, {0, 1408}) = 0 (Timeout)
Note the above isn't the spinlock, it's the process's semaphore. It'll
only get set if the refcount ever indicates that nobody but autovac is
holding the lock.
> > How did this perform < 9.4?
> this is a new project. However, I can run it vs earlier version.
>
> Can you guess how many times these dynamic
> > statements are planned? How many different relations are accessed in the
> > dynamically planned queries?
>
> only once or twice, and only a couple of tables.
Hm. Odd. The first -g profile seemed to indicate a hell of a lot time
was spent in LWLockRelease() - indicating that there's actually
progress. Later profiles/backtraces were less clear.
If you gdb in, and type 'fin' a couple times, to wait till the function
finishes, is there actually any progress? I'm wondering whether it's
just many catalog accesses + contention, or some other
problem. Alternatively set a breakpoint on ScanPgRelation() or so and
see how often it's hit.
> I can send the code off-list if you guys think it'd help.
Might be interesting.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2015-01-14 15:30:55 | Re: [PATCH] explain sortorder |
Previous Message | Timmer, Marius | 2015-01-14 15:26:02 | Re: [PATCH] explain sortorder |