Re: populating a table via the COPY command using C code.

From: "Mak, Jason" <jason(dot)mak(at)ngc(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: populating a table via the COPY command using C code.
Date: 2005-04-27 20:48:45
Message-ID: 521ABD2E7DC4254D9633A530906212D5026A99@xcgny105.northgrum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> What example are you looking at and what don't you understand about it?

Some of the examples that I looked over are either from the internet or from the Postgres Manual. The API I'm refering to is PQputCopyData. However, with the explanation given below. I'm starting to understand.

> libpq provides the primitives that you could use to implement such
> an API: it would be a trivial matter to write a function that opens
> the indicated file, reads its contents, and sends them to the
> database. As the documentation indicates, you'd use PQexec() or
> its ilk to send a COPY FROM STDIN command (see the COPY documentation
> for the exact syntax), then PQputCopyData() or PQputline() to send
> the data (probably in a loop), then PQputCopyEnd() or PQendcopy()
> to indicate that you're finished. Add the necessary file I/O
> statements and there's your function.

so basically in C, I would open some file i/o using fopen and in a loop. Do something like a read line into the buffer with some byte count and send that to the database using the PQputCopyData. Is this correct??

> Do you have a reason for using an intermediate file? Instead of
> writing data to the file and then reading it back, you could use
> PQputCopyData() or PQputline() to send the data directly to the
> database.

For the project I'm working on. We basically setup a postgres data warehouse. We have a large set of binary data that needs to be parsed and translated into something meaningful. We intend to load this processed data into 3 tables using the quickest means possible. I've already tried parsing and doing inserts. but this proved to be very slow. So I figured a 2 step automated process. The first step would be to parse the data and create 3 separate files. then load each file into the warehouse. Never considered using PQputCopyData in realtime. Not sure how this would work given 3 different tables that hold differnet data or how fast it's going to be. but I have tried the last approach. It works fairly well. The only problem is the lack of insight into where it is during the load processing.

What's your thoughts?? which approach would be the fastest?
1) 2 step process.
2) realtime PQputCopyData - not sure how this would work with 3 different tables.
3) COPY tablename FROM 'filename'

thanks,
jason.

-----Original Message-----
From: Michael Fuhr [mailto:mike(at)fuhr(dot)org]
Sent: Wednesday, April 27, 2005 3:46 PM
To: Mak, Jason
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] populating a table via the COPY command using C
code.

[Please copy the mailing list on replies so others can contribute
to and learn from the discussion.]

On Wed, Apr 27, 2005 at 02:34:26PM -0400, Mak, Jason wrote:
>
> Yes, my application is a client application that uses libpq api, ie.
> PQexec, etc... I have looked at the "Functions Associated with the COPY
> Command". But I still don't understand. what I really need is an
> example of how those api's(PQputCopyData) are used, other than the
> "simple" example that's provided.

What example are you looking at and what don't you understand about it?

> This "dataload" should be relatively simple. I already have a flat
> file created. I should be able to use some api and say here is the
> pointer to my db connection and here is a pointer to the flat file.
> now do your thing. Perhaps you can explain this to me.

libpq provides the primitives that you could use to implement such
an API: it would be a trivial matter to write a function that opens
the indicated file, reads its contents, and sends them to the
database. As the documentation indicates, you'd use PQexec() or
its ilk to send a COPY FROM STDIN command (see the COPY documentation
for the exact syntax), then PQputCopyData() or PQputline() to send
the data (probably in a loop), then PQputCopyEnd() or PQendcopy()
to indicate that you're finished. Add the necessary file I/O
statements and there's your function.

Do you have a reason for using an intermediate file? Instead of
writing data to the file and then reading it back, you could use
PQputCopyData() or PQputline() to send the data directly to the
database.

Another possibility: if the file resides somewhere the backend can
read, and if you can connect to the database as a superuser, then
you could use COPY tablename FROM 'filename'.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2005-04-27 20:52:47 Re: Serial / auto increment data type
Previous Message Tom Lane 2005-04-27 20:41:12 Re: restarting after power outage