Re: COPY FROM WHEN condition

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Surafel Temsgen <surafel3000(at)gmail(dot)com>, berlin(dot)ab(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: COPY FROM WHEN condition
Date: 2018-12-08 13:36:38
Message-ID: 0360690c-090c-3bd5-3b56-75379f14c419@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/6/18 4:52 PM, Robert Haas wrote:
> On Wed, Nov 28, 2018 at 6:17 PM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>> Comparing with overhead of setting snapshot before evaluating every row
>>> and considering this
>>>
>>> kind of usage is not frequent it seems to me the behavior is acceptable
>>
>> I'm not really buying the argument that this behavior is acceptable
>> simply because using the feature like this will be uncommon. That seems
>> like a rather weak reason to accept it.
>>
>> I however agree we don't want to make COPY less efficient, at least not
>> unless absolutely necessary. But I think we can handle this simply by
>> restricting what's allowed to appear the FILTER clause.
>>
>> It should be fine to allow IMMUTABLE and STABLE functions, but not
>> VOLATILE ones. That should fix the example I shared, because f() would
>> not be allowed.
>
> I don't think that's a very good solution. It's perfectly sensible
> for someone to want to do WHERE/FILTER random() < 0.01 to load a
> smattering of rows, and this would rule that out for no very good
> reason.
>

Good point. I agree that's a much more plausible use case for this
feature, and forbidding volatile functions would break it.

> I think it would be fine to just document that if the filter condition
> examines the state of the database, it will not see the results of the
> COPY which is in progress. I'm pretty sure there are other cases -
> for example with triggers - where you can get code to run that can't
> see the results of the command currently in progress, so I don't
> really buy the idea that having a feature which works that way is
> categorically unacceptable.
>
> I agree that we can't justify flagrantly wrong behavior on the grounds
> that correct behavior is expensive or on the grounds that the
> incorrect cases will be rare. However, when there's more than one
> sensible behavior, it's not unreasonable to pick one that is easier to
> implement or cheaper or whatever, and that's the category into which I
> would put this decision.
>

OK, makes sense. I withdraw my objections to the original behavior, and
agree it's acceptable if it's reasonably documented.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-12-08 13:46:08 Re: pg_partition_tree crashes for a non-defined relation
Previous Message Dave Cramer 2018-12-08 13:36:08 Re: extended query protcol violation?