From: | Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> |
---|---|
To: | Andras Fabian <Fabian(at)atrada(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: PG_DUMP very slow because of STDOUT ?? |
Date: | 2010-07-13 03:03:23 |
Message-ID: | 4C3BD77B.4030200@postnewspapers.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 12/07/10 21:03, Andras Fabian wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to do so anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers, and started the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the 3 GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to our production servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is performing well too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not have a reboot, is still behaving badly.
> Now I tried to dig deeper (without killing a production server again) ... and came to comparing the outputs of PS (with '-fax' parameter then, '-axl'). Now I have found something interesting:
> - all fast servers show the COPY process as being in the state Rs ("runnable (on run queue)")
> - on the still slow server, this process is in 9 out of 10 samples in Ds ("uninterruptible sleep (usually IO)")
>
> Now, this "Ds" state seems to be something unhealthy - especially if it is there almost all the time - as far as my first reeds on google show (and although it points to IO, there is seemingly only very little IO, and IO-wait is minimal too). I have also done "-axl" with PS, which brings the following line for our process:
> F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
> 1 5551 2819 4201 20 0 5941068 201192 conges Ds ? 2:05 postgres: postgres musicload_cache [local] COPY"
Your wchan column isn't wide enough to show the full function name, but
I'd say it's related to some form of throttling or congestion control.
Get a wider view of that column to find out what the full function name
is. Grepping the kernel source for it can then tell you a lot about
where in the kernel it is and what might be going on.
Try:
ps ax -O wchan:40
to get a decently wide view of that col.
--
Craig Ringer
Tech-related writing: http://soapyfrogs.blogspot.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2010-07-13 03:11:37 | Re: Testing 9.0beta3 and pg_upgrade |
Previous Message | Bruce Momjian | 2010-07-13 02:13:01 | Re: PostgreSQL 9.0 beta 3 release announcement |