Re: Getting sorted data from foreign server for merge join

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Getting sorted data from foreign server for merge join
Date: 2015-12-18 20:47:51
Message-ID: CA+TgmoaWP05mn7Eo1EHnYn+qgJc_tSp-w_+O4O7707R+uTrV=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 17, 2015 at 3:32 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Wed, Dec 9, 2015 at 12:14 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
>> wrote:
>> > Thanks Ashutosh.
>> >
>> > Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
>> > looks good to me.
>>
>> This patch needs a rebase.
>
> Done.

Thanks.

>> It's not going to work to say this is a patch proposed for commit when
>> it's still got a TODO comment in it that obviously needs to be
>> changed. And the formatting of that long comment is pretty weird,
>> too, and not consistent with other functions in that same file (e.g.
>> get_remote_estimate, ec_member_matches_foreign, create_cursor).
>>
>
> The TODO was present in v4 but not in v5 and is not present in v6 attached
> here.. Formatted comment according estimate_path_cost_size(),
> convert_prep_stmt_params().

Hrm, I must have been looking at the wrong version somehow. Sorry about that.

>> Aside from that, I think before we commit this, somebody should do
>> some testing that demonstrates that this is actually a good idea. Not
>> as part of the test case set for this patch, but just in general.
>> Merge joins are typically going to be relevant for large tables, but
>> the examples in the regression tests are necessarily tiny. I'd like
>> to see some sample data and some sample queries that get appreciably
>> faster with this code. If we can't find any, we don't need the code.
>>
>
> I tested the patch on my laptop with two types of queries, a join between
> two foreign tables on different foreign servers (pointing to the same self
> server) and a join between one foreign and one local table. The foreign
> tables and servers are created using sort_pd_setup.sql attached. Foreign
> tables pointed to table with index useful for join clause. Both the joining
> tables had 10M rows. The execution time of query was measured for 100 runs
> and average and standard deviation were calculated (using function
> query_execution_stats() in script sort_pd.sql) and are presented below.

OK, cool.

I went over this patch in some detail today and did a lot of cosmetic
cleanup. The results are attached. I'm fairly happy with this
version, but let me know what you think. Of course, feedback from
others is more than welcome also.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
order-for-merge-pushdown-rmh.patch text/x-patch 20.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-18 20:50:18 Re: Using quicksort for every external sort run
Previous Message Tom Lane 2015-12-18 20:42:50 Re: Making tab-complete.c easier to maintain