Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 10:32:17
Message-ID: 162867790911250232u75985e29mba6310358d4b2911@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/11/25 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Wed, 2009-11-25 at 09:23 +0100, Pavel Stehule wrote:
>> > If SRFs use a tuplestore in that situation, it sounds like that should
>> > be fixed. Why do we need to provide alternate syntax involving COPY?
>>
>> It isn't problem of SRF function design. It allow both mode - row and
>> tuplestor.
>
> select * from generate_series(1,1000000000) limit 1;
>
> That statement takes a long time, which indicates to me that it's
> materializing the result of the SRF. And there's no insert there.

This is missing optimalisation. If I understand - PostgreSQL wait for
complete result set - so in this case materialisation is necessary. In
your query pg do materialisation too early.

postgres=# select * from generate_series(1,100000) limit 1;
generate_series
─────────────────
1
(1 row)

Time: 59,540 ms
postgres=# select generate_series(1,100000) limit 1;
generate_series
─────────────────
1
(1 row)

Time: 1,107 ms

But usually we can process all rows from SRF function - so problem
with LIMIT isn't significant.

I am testing:

1.
postgres=# select count(*) from generate_series(1,1000000);
count
─────────
1000000
(1 row)

Time: 930,720 ms

2.
postgres=# select count(*) from (select generate_series(1,1000000)) x;
count
─────────
1000000
(1 row)

Time: 276,511 ms

2. is significantly faster then 1 (there are not SRF materialisation)

postgres=# create table foo(a integer);
CREATE TABLE

postgres=# insert into foo select generate_series(1,1000000);
INSERT 0 1000000
Time: 4274,869 ms

postgres=# insert into foo select * from generate_series(1,1000000);
INSERT 0 1000000
Time: 4814,756 ms

postgres=# copy foo to '/tmp/xxx';
COPY 1000000
Time: 1033,078 ms

postgres=# set synchronous_commit to off;
SET

postgres=# copy foo from '/tmp/xxx';
COPY 1000000
Time: 2749,277 ms

postgres=# insert into foo select generate_series(1,1000000);
INSERT 0 1000000
Time: 3948,734 ms

generate_function is fast and simple - but still COPY is about 30% faster

>
>> This is problem of INSERT statement, resp. INSERT INTO
>> SELECT implementation.
>
> If "tmp" is a new table, and "zero" is a table with a million zeros in
> it, then:
>  insert into tmp select 1/i from zero;
> fails instantly. That tells me that it's not materializing the result of
> the select; rather, it's feeding the rows in one at a time.
>

I thing, so materialisation is every time, when you use any SQL
statement without cursor.

> Can show me in more detail what you mean? I'm having difficulty
> understanding your short replies.

I thing, so COPY tab from/to fce() should be used for very large
import export, where INSERT SELECT needs minimally one
materialisation.

p.s. I am sorry - I am not native speaker - so I am speaking in short replies.

Pavel

>
> Regards,
>        Jeff Davis
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-11-25 11:00:30 Hot Standby remaining issues
Previous Message Hannu Krosing 2009-11-25 10:26:39 Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION