Re: COPY FROM WHEN condition

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Surafel Temesgen <surafel3000(at)gmail(dot)com>
Cc: berlin(dot)ab(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: COPY FROM WHEN condition
Date: 2018-11-24 02:09:25
Message-ID: 8c9edfdf-6fad-7c68-8969-e1e315baa7a4@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 11/23/18 12:14 PM, Surafel Temesgen wrote:
>
>
> On Sun, Nov 11, 2018 at 11:59 PM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
>
> So, what about using FILTER here? We already use it for aggregates when
> filtering rows to process.
>
> i think its good idea and describe its purpose more. Attache is a
> patch that use FILTER instead

Thanks, looks good to me. A couple of minor points:

1) While comparing this to the FILTER clause we already have for
aggregates, I've noticed the aggregate version is

FILTER '(' WHERE a_expr ')'

while here we have

FILTER '(' a_expr ')'

For a while I was thinking that maybe we should use the same syntax
here, but I don't think so. The WHERE bit seems rather unnecessary and
we probably implemented it only because it's required by SQL standard,
which does not apply to COPY. So I think this is fine.

2) The various parser checks emit errors like this:

case EXPR_KIND_COPY_FILTER:
err = _("cannot use subquery in copy from FILTER condition");
break;

I think the "copy from" should be capitalized, to make it clear that it
refers to a COPY FROM command and not a copy of something.

3) I think there should be regression tests for these prohibited things,
i.e. for a set-returning function, for a non-existent column, etc.

4) What might be somewhat confusing for users is that the filter uses a
single snapshot to evaluate the conditions for all rows. That is, one
might do this

create or replace function f() returns int as $$
select count(*)::int from t;
$$ language sql;

and hope that

copy t from '/...' filter (f() <= 100);

only ever imports the first 100 rows - but that's not true, of course,
because f() uses the snapshot acquired at the very beginning. For
example INSERT SELECT does behave differently:

test=# copy t from '/home/user/t.data' filter (f() < 100);
COPY 81
test=# insert into t select * from t where f() < 100;
INSERT 0 19

Obviously, this is not an issue when the filter clause references only
the input row (which I assume will be the primary use case).

Not sure if this is expected / appropriate behavior, and if the patch
needs to do something else here.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anne Marie Harm 2018-11-24 02:31:23 could not connect to server, in order to operate pgAdmin/PostgreSQL
Previous Message Michael Paquier 2018-11-24 01:35:03 Re: pgsql: Add WL_EXIT_ON_PM_DEATH pseudo-event.