Re: Simple (hopefully) throughput question?

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Simple (hopefully) throughput question?
Date: 2010-11-05 19:29:48
Message-ID: AANLkTine9evHx1tTGvZXvA0wTMmarYgbtD=t0AiAP7gS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Nov 5, 2010 at 12:23 PM, Samuel Gendler
<sgendler(at)ideasculptor(dot)com>wrote:

> On Thu, Nov 4, 2010 at 8:07 AM, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>wrote:
>
>> 04.11.10 16:31, Nick Matheson написав(ла):
>>
>> Heikki-
>>>
>>>>
>>>> Try COPY, ie. "COPY bulk_performance.counts TO STDOUT BINARY".
>>>>
>>>> Thanks for the suggestion. A preliminary test shows an improvement
>>> closer to our expected 35 MB/s.
>>>
>>> Are you familiar with any Java libraries for decoding the COPY format?
>>> The spec is clear and we could clearly write our own, but figured I would
>>> ask. ;)
>>>
>> JDBC driver has some COPY support, but I don't remember details. You'd
>> better ask in JDBC list.
>>
>>
>>
> The JDBC driver support works fine. You can pass a Reader or InputStream
> (if I recall correctly, the InputStream path is more efficient. Or maybe
> the Reader path was buggy. Regardless, I wound up using an InputStream in
> the driver which I then wrap in a Reader in order to get it line-by-line.
>
> You can write a COPY statement to send standard CSV format - take a look at
> the postgres docs for the COPY statement to see the full syntax. I then
> have a subclass of BufferedReader which parses each line of CSV and does
> something interesting with it. I've had it working very reliably for many
> months now, processing about 500 million rows per day (I'm actually COPYing
> out, rather than in, but the concept is the same, rgardless - my
> outputstream is wrapper in a writer, which reformats data on the fly).
>
>
>
I should mention that I found basically no documentation of the copy api in
the jdbc driver in 8.4. I have no idea if that has changed with 9.x. I had
to figure it out by reading the source code. Fortunately, it is very
simple:

return ((PGConnection) con).getCopyAPI().copyIn(sql, this.fis);

Where this.fis is an InputStream. There's an alternative copyIn
implementation that takes a Reader instead. I'm sure the copyOut methods
are the same.

Note: my earlier email was confusing. copyIn, copies into the db and
receives an InputStream that will deliver data when it is read. copyOut
copies data from the db and receives an OutputStream which will receive the
data. I inverted those in my earlier email.

You can look at the source code to the CopyAPI to learn more about the
mechanism.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2010-11-05 20:15:20 Major Linux performance regression; shouldn't we be worried about RHEL6?
Previous Message Samuel Gendler 2010-11-05 19:23:59 Re: Simple (hopefully) throughput question?