Re: gsoc ideas

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: longlong <asfnuts(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: gsoc ideas
Date: 2008-03-11 05:00:17
Message-ID: Pine.GSO.4.64.0803110020300.18872@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 10 Mar 2008, longlong wrote:

> 1.release8.2 make COPY TO can copy the output of an arbitrary SELECT
> statement. so i think maybe COPY FROM can get data from output and 'insert
> into' some column that designated. the format of the command will be
> discussed.

This would be a nice feature. Right now there are often applications
where there is a data loading or staging table that ends up being merged
with a larger table after some cleanup. Moving that data from the
preperation area into the final table right now is most easily done with
INSERT INTO X (SELECT A,B FROM C) type actions. This is slow because
INSERT takes much longer than COPY. Adding support for COPY X FROM
(SELECT A,B FROM C) would make this problem go away.

It is possible to do this right now with some clever use of STDIN/OUT like
the below, but having a pure SQL solution would be more widely applicable.
The overhead of having to pass everything through the client (as STDIN/OUT
do) is certainly not zero.

> 2.this come from TODO list: COPY always behaviors like a unit of work thar
> consists of some insert commands, if any error, it rollback. but sometimes
> we only care the data should be inserted. in that situation, i used to use
> "try....catch...." insert row by row to skip the error, because it will take
> much time to examine every row. so:
> Allow COPY to report error lines and continue. this is a good idea.

This is a long standing request and many people would be happy to see it
implemented. You do want to make sure the implementation easily allows
pushing all the lines that didn't commit into what's commonly called a
"reject file".

> 3.sometimes, i want to copy data from one database to another. i think using
> COPY will simple the code. i want the content from COPY TO not store in the
> file, but in the memory, and i can COPY FROM the memory(i don't kown COPY
> with STDIN and STDOUT can do this or not.).

It can:

create table x(a int);
insert into x(select generate_series(1,10));
create table y(b int);

psql -c "copy x to stdout" | psql -c "copy y from stdout"

Try it out, table y will have the same thing when it's all done.

I think you've got the basics of some useful features to add here. What
you probably want to do is write a slightly longer description of your
plan and submit it to the pgsql-hackers list where the developers are at
to get feedback on the feasibility of doing this as a GSOC project. From
your message, I get the impression that English writing is tough for you.
That will make it a little harder for you to get through the process of
getting a patch designed and then accepted, as this community likes to
talk through that sort of thing. If you've got another language you're
more comfortable with, you might also want to see if there's an existing
community member who speaks it you might work with to make that easier.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Devrim GÜNDÜZ 2008-03-11 05:34:13 Re: Relocation error: /usr/lib/libpq.so.5: undefined symbol: krb5_cc_get_principal
Previous Message Tom Lane 2008-03-11 03:18:48 Re: message contents do not agree with length in message type "T"