Re: Let file_fdw access COPY FROM PROGRAM

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Let file_fdw access COPY FROM PROGRAM
Date: 2016-09-29 17:51:08
Message-ID: 31698.1475171468@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Corey Huinker <corey(dot)huinker(at)gmail(dot)com> writes:
> [ file_fdw_program_v3.diff ]

Pushed with cosmetic adjustments, mostly more work on the comments and
documentation.

I did not push the proposed test case; it's unportable. The upthread
suggestion to add a TAP test would have been all right, because
--enable-tap-tests requires Perl, but the basic regression tests must
not. I'm a bit dubious that it'd be worth the work to create such a
test anyway, when COPY FROM PROGRAM itself hasn't got one.

What *would* be worth some effort is allowing ANALYZE on a file_fdw
table reading from a program. I concur that that probably can't be
the default behavior, but always falling back to the 10-block default
with no pg_stats stats is a really horrid prospect.

One idea is to invent another table-level FDW option "analyze".
If we could make that default to true for files and false for programs,
it'd preserve the desired default behavior, but it would add a feature
for plain files too: if they're too unstable to be worth analyzing,
you could turn it off.

Another thought is that maybe manual ANALYZE should go through in any
case, and the FDW option would only be needed to control auto-analyze.
Although I'm not sure what to think about scripted cases like
vacuumdb --analyze. Maybe we'd need two flags, one permitting explicit
ANALYZE and one for autoanalyze, which could have different defaults.

Another thing that felt a little unfinished was the cost estimation
behavior. Again, it's not clear how to do any better by default,
but is it worth the trouble to provide an FDW option to let the user
set the cost estimate for reading the table? I'm not sure honestly.
Since there's only one path for the FDW itself, the cost estimate
doesn't matter in simple cases, and I'm not sure how much it matters
even in more complicated ones. It definitely sucks that we don't
have a rows estimate that has anything to do with reality, but allowing
ANALYZE would be enough to handle that.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mithun Cy 2016-09-29 18:15:52 Re: Patch: Implement failover on libpq connect level.
Previous Message Jeff Janes 2016-09-29 17:00:57 Re: Notice lock waits