Skip site navigation (1) Skip section navigation (2)

Re: Concurrent COPY commands

From: "Phillip Sitbon" <phillip(at)sitbon(dot)net>
To: pgsql-novice(at)postgresql(dot)org
Subject: Re: Concurrent COPY commands
Date: 2008-07-09 16:35:15
Message-ID: 536685ea0807090935i2e191146y7f3acf5bba4ca0ea@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-novice
Sorry about the late reply.

I only have two fast SATA drives on software RAID, but that really isn't the
issue- while the copy commands are going, disk activity is relatively low.
By relatively I mean that I have seen it a lot higher under certain
circumstances, and I know for sure the disks aren't holding anything back. I
know it's a bad comparison, but the process generating this huge amount of
data can write directly to the disk very fast and still be CPU-bound, while
it eventually ends up waiting for postgres when I try to pipe it into the
database. I figured some overhead was to be expected and that's why I tried
the parallel setup in the first place.

What I see is that after some buffering (not sure it is buffering, but after
it gets some data), one postgres process will ramp up to 100% CPU (on one
core) for some time, thus blocking its input FIFO. That is when the hard
drive activity goes up a bit, but whatever it is doing is definitely
CPU-bound on that core.

No more than one worker process does this at a time. And no matter what kind
of FIFO buffers and select() calls I use, the calling process eventually
gets blocked because the postgres processes don't appear to be working in
parallel as well as they could be; hence, postgres doesn't take in any more
data for a while. I'm really curious about why going parallel x6 is so much
slower than one process when the disks aren't being pushed that hard
compared to their capabilities.

I am suspecting something wrong with my config, but I can't be sure. Is 1-2
GB for work_mem ok? Would that hurt it?

On a positive note, I let the single-process version run to completion and I
now have a solid TB of data that I can access and use at lightning speed :)

Cheers,

  Phillip

On Wed, Jul 2, 2008 at 10:02 AM, Alan Hodgson <ahodgson(at)simkin(dot)ca> wrote:

> On Wednesday 02 July 2008, Phillip Sitbon <phillip(at)sitbon(dot)net> wrote:
> > Hello,
> >
> > I am running some queries that use multiple connections to issue COPY
> > commands which bring data into the same table via different files (FIFOs
> > to be precise). This is being done on a SMP machine and I am noticing
> > that none of the postgres worker processes operate in parallel, even
> > though there is data available to all of them. The performance is nearly
> > exactly the same as it is for issuing a single COPY command.
> > Is this
> > normal behavior, even with all of the separate transactions still in
> > progress? Would I be better off doing multithreaded bulk inserts from my
> > C program rather than sending the data to FIFOs?
>
> Sounds like you're I/O bound - I doubt any other concurrency mechanism will
> change that much.
>
> >
> > The machine I am using has 16GB of memory and 8 cores, so I've tried to
> > optimize the configuration accordingly but I am a little lost in some
> > places.
>
> Ah, but what does your RAID controller and drives look like?
>
>
> --
> Alan
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>

In response to

Responses

pgsql-novice by date

Next:From: Alan HodgsonDate: 2008-07-09 16:45:37
Subject: Re: Concurrent COPY commands
Previous:From: Emil ObermayrDate: 2008-07-09 07:12:50
Subject: Re: how to get dependancies of a table?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group