Re: COPY Performance

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Hans Zaunere" <lists(at)zaunere(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: COPY Performance
Date: 2008-05-05 15:01:21
Message-ID: dcc563d10805050801q11f1c5d3taf3204af3daad957@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, May 5, 2008 at 6:18 AM, Hans Zaunere <lists(at)zaunere(dot)com> wrote:
> > > We're using a statement like this to dump between 500K and >5 million
> > > rows.
> >
> > > COPY(SELECT SomeID FROM SomeTable WHERE SomeColumn > '0')
> > > TO '/dev/shm/SomeFile.csv'
> >
> > > Upon first run, this operation can take several minutes. Upon second
> > > run, it will be complete in generally well under a minute.
> >
> > Hmmm ... define "first" versus "second". What do you do to return it
> > to the slow state?
>
> Interesting that you ask. I haven't found a very reliable way to reproduce
> this.
>
> Typically, just waiting a while to run the same query the second time will
> reproduce this behavior. I restarted postgresql and it was reproduced as
> well. However, I can't find a way to flush buffers/etc, to reproduce the

what happens if you do something like:

select count(*) from (select ...);

i.e. don't make the .csv file each time. How's the performance
without making the csv versus making it?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2008-05-05 15:03:15 Re: COPY Performance
Previous Message Michael Enke 2008-05-05 14:41:20 CREATE CHARSET would be nice feature