Re: COPY FROM WHEN condition

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "corey(dot)huinker" <corey(dot)huinker(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, nasbyj(at)amazon(dot)com, Surafel Temsgen <surafel3000(at)gmail(dot)com>, Christoph Moench-Tegeder <cmt(at)burggraben(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: COPY FROM WHEN condition
Date: 2018-11-02 05:33:42
Message-ID: CAFj8pRCgNX9TUBD2SeoErMHs7bxb6xmxTpTFk1L_SqbEnxfinQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

pá 2. 11. 2018 v 3:57 odesílatel Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
napsal:

> > Are you thinking something like having a COPY command that provides
>> > results in such a way that they could be referenced in a FROM clause
>> > (perhaps a COPY that defines a cursor…)?
>>
>> That would also be nice, but what I was thinking of was that some
>> highly restricted subset of cases of SQL in general could lend
>> themselves to levels of optimization that would be impractical in
>> other contexts.
>>
>
> If COPY (or a syntactical equivalent) can return a result set, then the
> whole of SQL is available to filter and aggregate the results and we don't
> have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
> subtly different from a similar FROM-WHERE.
>
> Also, what would we be saving computationally? The whole file (or program
> output) has to be consumed no matter what, the columns have to be parsed no
> matter what. At least some of the columns have to be converted to their
> assigned datatypes enough to know whether or not to filter the row, but we
> might be able push that logic inside a copy. I'm thinking of something like
> this:
>
> SELECT x.a, sum(x.b)
> FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
> numeric, c text, d date, e json) )
> WHERE x.d >= '2018-11-01'
>
>
Without some special feature this example is not extra useful. It is based
on copy on server that can use only super user with full rights.

What should be benefit of this feature?

Regards

Pavel

> In this case, there is the *opportunity* to see the following
> optimizations:
> - columns c and e are never referenced, and need never be turned into a
> datum (though we might do so just to confirm that they conform to the data
> type)
> - if column d is converted first, we can filter on it and avoid converting
> columns a,b
> - whatever optimizations we can infer from knowing that the two surviving
> columns will go directly into an aggregate
>
> If we go this route, we can train the planner to notice other
> optimizations and add those mechanisms at that time, and then existing code
> gets faster.
>
> If we go the COPY-WHEN route, then we have to make up new syntax for every
> possible future optimization.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2018-11-02 06:55:05 Re: row filtering for logical replication
Previous Message Michael Paquier 2018-11-02 05:27:15 Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT