Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-04-10 06:00:58
Message-ID: F719904E-BD42-40F9-8376-0EF28E219CFC@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kaigai-san,

Thanks for further review, but I found two bugs in v10 patch.
I’ve fixed them and wrapped up v11 patch here.

* Fix bug about illegal column order

Scan against a base relation returns columns in order of column definition, but its target list might require different order. This can be resolved by tuple projection in usual cases, but pushing down joins skips the step, so we need to treat it in remote query.

Before this fix, deparseProjectionSql() was called only for queries which have ctid or whole-row reference in its target list, but it was a too-much optimization. We always need to call it, because usual column list might require ordering conversion. Checking ordering is not impossible, but it seems useless effort.

Another way to resolve this issue is to reorder SELECT clause of a query for base relation if it was a source of a join, but it requires stepping back in planning, so the fix above was chosen.

"three tables join" test case is also changed to check this behavior.

* Fix bug of duplicate fdw_ps_tlist contents.

I coded that deparseSelectSql passes fdw_ps_tlist to deparseSelectSql for underlying RelOptInfo, but it causes redundant entries in fdw_ps_tlist in cases of joining more than two foreign tables. I changed to pass NULL as fdw_ps_tlist for recursive call of deparseSelectSql.

* Fix typos

Please review the v11 patch, and mark it as “ready for committer” if it’s ok.

In addition to essential features, I tried to implement relation listing in EXPLAIN output.

Attached explain_forein_join.patch adds capability to show join combination of a ForeignScan in EXPLAIN output as an additional item “Relations”. I thought that using array to list relations is a good way too, but I chose one string value because users would like to know order and type of joins too.

2015/04/09 21:22、Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> のメール:

>> 2015/04/09 10:48、Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> のメール:
>> * merge_fpinfo()
>>>>> It seems to me fpinfo->rows should be joinrel->rows, and
>>>>> fpinfo->width also should be joinrel->width.
>>>>> No need to have special intelligence here, isn't it?
>>>>
>>>>
>>>> Oops. They are vestige of my struggle which disabled SELECT clause optimization
>>>> (omit unused columns). Now width and rows are inherited from joinrel.
>> Besides
>>>> that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use
>> simple
>>>> summary, not average.
>>>>
>>> Does fpinfo->fdw_startup_cost represent a cost to open connection to remote
>>> PostgreSQL, doesn't it?
>>>
>>> postgres_fdw.c:1757 says as follows:
>>>
>>> /*
>>> * Add some additional cost factors to account for connection overhead
>>> * (fdw_startup_cost), transferring data across the network
>>> * (fdw_tuple_cost per retrieved row), and local manipulation of the data
>>> * (cpu_tuple_cost per retrieved row).
>>> */
>>>
>>> If so, does a ForeignScan that involves 100 underlying relation takes 100
>>> times heavy network operations on startup? Probably, no.
>>> I think, average is better than sum, and max of them will reflect the cost
>>> more correctly.
>>
>> In my current opinion, no. Though I remember that I've written such comments
>> before :P.
>>
>> Connection establishment occurs only once for the very first access to the server,
>> so in the use cases with long-lived session (via psql, connection pooling, etc.),
>> taking connection overhead into account *every time* seems too pessimistic.
>>
>> Instead, for practical cases, fdw_startup_cost should consider overheads of query
>> construction and getting first response of it (hopefully it minus retrieving
>> actual data). These overheads are visible in the order of milliseconds. I’m
>> not sure how much is appropriate for the default, but 100 seems not so bad.
>>
>> Anyway fdw_startup_cost is per-server setting as same as fdw_tuple_cost, and it
>> should not be modified according to the width of the result, so using
>> fpinfo_o->fdw_startup_cost would be ok.
>>
> Indeed, I forgot the connection cache mechanism. As long as we define
> fdw_startup_cost as you mentioned, it seems to me your logic is heuristically
> reasonable.
>
>>> Also, fdw_tuple_cost introduce the cost of data transfer over the network.
>>> I thinks, weighted average is the best strategy, like:
>>> fpinfo->fdw_tuple_cost =
>>> (fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width) *
>> fpinfo_o->fdw_tuple_cost +
>>> (fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) *
>> fpinfo_i->fdw_tuple_cost;
>>>
>>> That's just my suggestion. Please apply the best way you thought.
>>
>> I can’t agree that strategy, because 1) width 0 causes per-tuple cost 0, and 2)
>> fdw_tuple_cost never vary in a foreign server. Using fpinfo_o->fdw_tuple_cost
>> (it must be identical to fpinfo_i->fdw_tuple_cost) seems reasonable. Thoughts?
>>
> OK, you are right.
>
> I think it is time to hand over the patch reviewing to committers.
> So, let me mark it "ready for committers".
>
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

--
Shigeru HANADA
shigeru(dot)hanada(at)gmail(dot)com

Attachment Content-Type Size
foreign_join_v11.patch application/octet-stream 163.0 KB
explain_foreign_join.patch application/octet-stream 17.1 KB
unknown_filename text/plain 4 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2015-04-10 07:20:04 Re: libpq's multi-threaded SSL callback handling is busted
Previous Message Michael Paquier 2015-04-10 05:55:12 Re: SSL information view