Re: PG_DUMP very slow because of STDOUT ??

From: Andras Fabian <Fabian(at)atrada(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: PG_DUMP very slow because of STDOUT ??
Date: 2010-07-13 09:18:47
Message-ID: B1A1AD14D5F9D647BD2A00988C53B8220ACA3176@atradaex03.nbg.atrada.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Craig,

Yes, a first look at /proc/pid/stack shows something that smells like memory management ... ut least up to the point where congestion_wait is called.
------------------------------------------------------------------
[<ffffffff8110d750>] congestion_wait+0x70/0x90
[<ffffffff81101cb7>] shrink_inactive_list+0x667/0x7e0
[<ffffffff81101ec1>] shrink_list+0x91/0xf0
[<ffffffff811020b7>] shrink_zone+0x197/0x240
[<ffffffff81102886>] __zone_reclaim+0x146/0x260
[<ffffffff81102ab7>] zone_reclaim+0x117/0x150
[<ffffffff810f8fd4>] get_page_from_freelist+0x544/0x6c0
[<ffffffff810f98c9>] __alloc_pages_nodemask+0xd9/0x180
[<ffffffff81131822>] kmalloc_large_node+0x62/0xb0
[<ffffffff81135df9>] __kmalloc_node_track_caller+0x109/0x160
[<ffffffff814665f0>] __alloc_skb+0x80/0x190
[<ffffffff81462504>] sock_alloc_send_pskb+0x1c4/0x320
[<ffffffff81462675>] sock_alloc_send_skb+0x15/0x20
[<ffffffff814f54d5>] unix_stream_sendmsg+0x275/0x3e0
[<ffffffff8145f5ab>] sock_sendmsg+0x10b/0x140
[<ffffffff8145f765>] sys_sendto+0x125/0x180
[<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
------------------------------------------------------------------

I also looked at "whats happening" with strace, where you see traffic like:
------------------------------------------------------------------
....
[ 7fc153d925b2] sendto(8, "\tt\tt\nd\0\0\0\0378241003\t974369\tt\t\\N\t10"..., 8192, 0, NULL, 0) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d925b2] sendto(8, "0\tt\tt\nd\0\0\0\0378241003\t830278\tt\t\\N\t1"..., 8192, 0, NULL, 0) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d925b2] sendto(8, "\tt\tt\nd\0\0\0 8241006\t1114684\tt\t\\N\t1"..., 8192, 0, NULL, 0) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[ 7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
....
------------------------------------------------------------------

But still no definitive clue about the reasons. What is also quite interesting is, that when I start my COPY-to-STDOUT experiment, it is running quite fast in the beginning. Sometimes up to 400 Mbytes, sometimes up to 1.4 GBytes (I didn't find a real reason which I could be correlated to this) ... and then, suddenly it begins to stall. From there on, it only advances slowly with all the congestion_wait going on ... Hmm, maybe it has really something to do with the state of the memory ... (this would go well with the fact, that a freshly rebooted server is not having the problem in the beginning).

Ahh, well, and what would/could go against the file system / drive argument. My initial tests, where I did a COPY-to-FILE, I have never head problems (only COPY-to-STDOUT gives them).

Well, I will try to gather some more information with your other hints (quite a lot of stuff - and one learns a lot new tricks in the process :-) ...

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig(at)postnewspapers(dot)com(dot)au]
Gesendet: Dienstag, 13. Juli 2010 11:01
An: Andras Fabian
Cc: Tom Lane; pgsql-general(at)postgresql(dot)org
Betreff: Re: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 4:05 PM, Andras Fabian wrote:
> Craig, thanks for that PS tip (you think, you have used PS for such a long time, but it still has some new tricks available).

> So, obviously, for some reason we are waiting too much for a backind_device ... which ever it is at the moment. Because, as I just wrote to Scott Marlowe, the disk system is almost idling (have seen disk utilization on the drive to which I write below 1%).

A quick search suggests that most calls into congestion_wait are in the
memory management subsystem, and are involved in situations where the
system is struggling for memory. However, that includes memory consumed
by I/O buffers, writeback for I/O, etc, so it'd also be consistent with
very slow I/O causing write throttling as the system tried to write
already buffered data to disk.

Most other calls are in file system drivers.

At this point I'd be taking a closer look at "vmstat 1" and "iostat 1"
output, plus "top", to see if any interesting clues about system-wide
issues turned up.

I'd also be trying to perform each step of the problem operation in
isolation as much as possible, so as to see if I could find out what
particular part was causing the slowdown. Comparing "\copy" to "COPY ...
TO STDOUT", invoking "COPY ... TO STDOUT" with a standalone backend
writing output to an on disk file and to /dev/null, etc.

> So, the question seems to be, why and where this "idling happens".

You can potentially find out more by getting a trace of the kernel
function call stack for the backend process. The kernel call stack of a
process at any given time can be obtained by reading /proc/$pid/stack .

This will tell you not only what call it's waiting in in the kernel, but
what function(s) called it, and even the code offset within each function.

> Just as a test, I have tried a very simple piping example (which should go trough STDOUT too ... or am I wrong).
> - "dd if=/dev/zero of=file_10GB bs=1024 count=10M" created a 10 GB test file on source drive (sdb)
> - "time cat file_10GB> /var/tmp/test2.dump" ... pipe the file to target drive (/var/tmp is on sda)

Isn't your issue suspected to be with network transfers over unix
sockets and/or tcp/ip, rather than with pipes?

Try "socat" if you want to test unix socket performance and/or tcp/ip
socket performance. It's an amazing sysadmin/network swiss army knife.

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2010-07-13 10:17:12 Re: PG_DUMP very slow because of STDOUT ??
Previous Message Craig Ringer 2010-07-13 09:05:02 Re: PG_DUMP very slow because of STDOUT ??