|From:||Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>|
|To:||Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>|
|Cc:||Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> 2015/04/09 10:48、Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> のメール：
> * merge_fpinfo()
> >>> It seems to me fpinfo->rows should be joinrel->rows, and
> >>> fpinfo->width also should be joinrel->width.
> >>> No need to have special intelligence here, isn't it?
> >> Oops. They are vestige of my struggle which disabled SELECT clause optimization
> >> (omit unused columns). Now width and rows are inherited from joinrel.
> >> that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use
> >> summary, not average.
> > Does fpinfo->fdw_startup_cost represent a cost to open connection to remote
> > PostgreSQL, doesn't it?
> > postgres_fdw.c:1757 says as follows:
> > /*
> > * Add some additional cost factors to account for connection overhead
> > * (fdw_startup_cost), transferring data across the network
> > * (fdw_tuple_cost per retrieved row), and local manipulation of the data
> > * (cpu_tuple_cost per retrieved row).
> > */
> > If so, does a ForeignScan that involves 100 underlying relation takes 100
> > times heavy network operations on startup? Probably, no.
> > I think, average is better than sum, and max of them will reflect the cost
> > more correctly.
> In my current opinion, no. Though I remember that I've written such comments
> before :P.
> Connection establishment occurs only once for the very first access to the server,
> so in the use cases with long-lived session (via psql, connection pooling, etc.),
> taking connection overhead into account *every time* seems too pessimistic.
> Instead, for practical cases, fdw_startup_cost should consider overheads of query
> construction and getting first response of it (hopefully it minus retrieving
> actual data). These overheads are visible in the order of milliseconds. I’m
> not sure how much is appropriate for the default, but 100 seems not so bad.
> Anyway fdw_startup_cost is per-server setting as same as fdw_tuple_cost, and it
> should not be modified according to the width of the result, so using
> fpinfo_o->fdw_startup_cost would be ok.
Indeed, I forgot the connection cache mechanism. As long as we define
fdw_startup_cost as you mentioned, it seems to me your logic is heuristically
> > Also, fdw_tuple_cost introduce the cost of data transfer over the network.
> > I thinks, weighted average is the best strategy, like:
> > fpinfo->fdw_tuple_cost =
> > (fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width) *
> fpinfo_o->fdw_tuple_cost +
> > (fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) *
> > That's just my suggestion. Please apply the best way you thought.
> I can’t agree that strategy, because 1) width 0 causes per-tuple cost 0, and 2)
> fdw_tuple_cost never vary in a foreign server. Using fpinfo_o->fdw_tuple_cost
> (it must be identical to fpinfo_i->fdw_tuple_cost) seems reasonable. Thoughts?
OK, you are right.
I think it is time to hand over the patch reviewing to committers.
So, let me mark it "ready for committers".
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
|Next Message||Magnus Hagander||2015-04-09 12:31:46||psql showing owner in \dT|
|Previous Message||Tom Lane||2015-04-09 12:20:45||Re: "rejected" vs "returned with feedback" in new CF app|