From: | Robert Voinea <rvoinea(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: Occasional spike in query response time (jumps from 200ms to 45 seconds) |
Date: | 2013-11-08 15:43:22 |
Message-ID: | 13209807.nWtyJFh2Fe@shu |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hi
On Thursday 07 November 2013 06:40:00 Kevin Grittner wrote:
> Robert Voinea <rvoinea(at)gmail(dot)com> wrote:
> > From time to time (1-2 months) one of the production systems I
> > manage starts acting crazy... and this is starting to become a
> > problem.
> >
> > Every query I send to the server has a very long running time
> > (sometimes it reaches 45+ seconds). Even the simples queries like
> > "SELECT NOW()" run in 4-5 seconds... or more.
> >
> > This goes on for a few minutes. After that, everything is back to
> > normal.
>
> A likely cause for this is transparent huge page defrag. If you
> run `vmstat 1` during an episode and see a lot of system CPU time,
> that tends to confirm this. If you run `perf top` during an
> episode and see the kernel spending a lot of time in spinlock
> functions, that pretty much nails it. You might want to turn off
> transparent huge page support.
I need to read about this first.
I forgot to mention that this is a 32 bit system... and due to the known
limitations we are forced to use PAE on the kernel...
Unfortunately we cannot use a 64 bit operating system.
> > I'm guessing that, somehow, there is heavy I/O usage at that time
>
> Which might be another possibility if the `vmstat 1` shows a lot of
> I/O wait time.
Another colleague of mine managed to somehow reproduce the issue.
It seems that the disk + network interface could be the bottle neck (iostat &
iotop shows average wait times very high) and the peak throughput of the disk
at 70MB/s.
This is not the actual production system, but a similar one (as hardware).
We'll try to re-do the tests with the clients and the server connected via a
Gigabit network and see if this helps.
> > I've seen that postgres 9.1.8 Changelog contains some bug fixes
> > related to performance of autovaccum:
> If there is high I/O and you see autovacuum running wraparound
> prevention vacuums or vacuuming tables which have recently had a
> large number of rows deleted, you might solve the problem by
> updating. I would strongly recommend updating anyway, because
> otherwise you are running with known bugs and security
> vulnerabilities, including one very serious one.
>
> Why knowingly put yourself through dealing with bugs that others
> have already encountered, reported, diagnosed, and fixed?
I agree with you on this, but the update requires downtime and has to be
scheduled in advance.
Thank you for replying.
I will investigate this even further and see what I may come up with.
I'll keep you informed with what I find.
--
Robert Voinea
Software Engineer
+4 0740 467 262
Don't take life too seriously. You'll never get out of it alive.
(Elbert Hubbard)
From | Date | Subject | |
---|---|---|---|
Next Message | Nagaraj Shindagi | 2013-11-10 14:37:41 | during the maintenance facing error |
Previous Message | bricklen | 2013-11-08 13:33:44 | Re: Occasional spike in query response time (jumps from 200ms to 45 seconds) |