Re: COPY FROM WHEN condition

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: surafel3000(at)gmail(dot)com
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY FROM WHEN condition
Date: 2018-10-31 07:51:09
Message-ID: CAD21AoCHM6FU0aErWUq+WhWQqx89ppZmbJAR3B2w_HLmaiPLMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 30, 2018 at 11:47 PM Surafel Temesgen <surafel3000(at)gmail(dot)com> wrote:
>
>
> Hi,
> Thank you for looking at it .
> On Sun, Oct 28, 2018 at 7:19 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>>
>> 1) I think this deserves at least some regression tests. Plenty of tests
>> already use COPY, but there's no coverage for the new piece. So let's
>> add a new test suite, or maybe add a couple of tests into copy2.sql.
>>
>>
>> 2) In copy.sqml, the new item is defined like this
>>
>> <term><literal>WHEN Clause</literal></term>
>>
>> I suggest we use just <term><literal>WHEN</literal></term>, that's what
>> the other items do (see ENCODING).
>>
>> The other thing is that this does not say what expressions are allowed
>> in the WHEN clause. It seems pretty close to WHEN clause for triggers,
>> which e.g. mentions that subselects are not allowed. I'm pretty sure
>> that's true here too, because it fails like this (118 = T_SubLink):
>>
>> test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10);
>> ERROR: unrecognized node type: 118
>>
>> So, the patch needs to detect this, produce a reasonable error message
>> and document the limitations in copy.sqml, just like we do for CREATE
>> TRIGGER.
>
> fixed
>>
>>
>> 3) For COPY TO, the WHEN clause is accepted but ignored, leading to
>> confusing cases like this:
>>
>> test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10);
>> COPY 151690
>>
>> So, it contains subselect, but unlike COPY FROM it does not fail
>> (because we never execute it). The fun part is that the expression is
>> logically false, so a user might expect it to filter rows, yet we copy
>> everything.
>>
>> IMHO we need to either error-out in these cases, complaining about WHEN
>> not being supported for COPY TO, or make it work (effectively treating
>> it as a simpler alternative to COPY (subselect) TO).
>
> English is not my first language but I chose error-out because WHEN condition for COPY TO seems to me semantically incorrect
>>
>>
>> AFAICS we could just get rid of the extra when_cluase variable and mess
>> with the cstate->whenClause directly, depending on how (3) gets fixed.
>
> I did it this way because CopyState structure memory allocate and initialize in BeginCopyFrom but the analysis done before it
>>
>>
>> 5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it
>> requires it to be 'WHEN (expr)'. I suggest we do the same thing here,
>> requiring the parentheses.
>>
>>
>> 6) The skip logic in CopyFrom() seems to be slightly wrong. It does
>> work, but the next_record label is defined after CHECK_FOR_INTERRUPTS()
>> so a COPY will not respond to Ctrl-C unless it finds a row matching the
>> WHEN condition. If you have a highly selective condition, that's a bit
>> inconvenient.
>>
>> It also skips
>>
>> MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
>>
>> so I wonder what the heap_form_tuple() right after the next_record label
>> will use for tuples right after a skipped one. I'd bet it'll use the
>> oldcontext (essentially the long-lived context), essentially making it
>> a memory leak.
>>
>> So I suggest to get rid of the next_record label, and use 'continue'
>> instead of the 'goto next_record'.
>>
> fixed
>>

I've looked at this patch and tested.

When I use a function returning string in WHEN clause I got the following error:

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) = 'hello');
ERROR: could not determine which collation to use for lower() function
HINT: Use the COLLATE clause to set the collation explicitly.
CONTEXT: COPY hoge, line 1: "1,hoge,2018-01-01"

And then although I specified COLLATE I got an another error (127 =
T_CollateExpr):

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) collate
"en_US" = 'hello');
ERROR: unrecognized node type: 127

This error doesn't happen if I put the similar condition in WHEN
clause for triggers. I think the patch needs to produce a reasonable
error message.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-10-31 07:52:35 Re: syntax error: VACUUM ANALYZE VERBOSE (PostgreSQL 11 regression)
Previous Message Pavel Stehule 2018-10-31 07:41:45 Re: ToDo: show size of partitioned table