Re: Getting sorted data from foreign server

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Getting sorted data from foreign server
Date: 2015-10-16 18:03:57
Message-ID: CA+TgmoYbO2TZ3JQVdrsLUXjc1YWeZEGETw3Q1cvn4GnbY81Ymw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 15, 2015 at 6:28 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> Attached is the patch which takes care of above comments.

I spent some time on this patch today. But it's still not right.

I've attached a new version which fixes a serious problem with your
last version - having postgresGetForeignPaths do the costing of the
sorted path itself instead of delegating that to
estimate_path_cost_size is wrong. In your version, 10% increment gets
applied to the network transmission costs as well as the cost of
generating the tupes - but only when use_remote_estimate == false. I
fixed this and did some cosmetic cleanup.

But you'll notice if you try this some of postgres_fdw's regression
tests fail. This is rather mysterious:

***************
*** 697,715 ****
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
! -> Nested Loop Semi Join
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
! Join Filter: (t1.c3 = t2.c3)
-> Foreign Scan on public.ft1 t1
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8
FROM "S 1"."T 1" WHERE (("C 1" < 20))
! -> Materialize
Output: t2.c3
! -> Foreign Scan on public.ft2 t2
Output: t2.c3
! Filter: (date(t2.c4) = '01-17-1970'::date)
! Remote SQL: SELECT c3, c4 FROM "S 1"."T 1"
WHERE (("C 1" > 10))
! (15 rows)

EXECUTE st2(10, 20);
c1 | c2 | c3 | c4 | c5
| c6 | c7 | c8
--- 697,718 ----
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
! -> Hash Join
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
! Hash Cond: (t1.c3 = t2.c3)
-> Foreign Scan on public.ft1 t1
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8
FROM "S 1"."T 1" WHERE (("C 1" < 20))
! -> Hash
Output: t2.c3
! -> HashAggregate
Output: t2.c3
! Group Key: t2.c3
! -> Foreign Scan on public.ft2 t2
! Output: t2.c3
! Filter: (date(t2.c4) = '01-17-1970'::date)
! Remote SQL: SELECT c3, c4 FROM "S 1"."T
1" WHERE (("C 1" > 10))
! (18 rows)

What I think is happening here is that the planner notices that
instead of doing a parameterized nestloop, it could pull down the data
already sorted from the remote side, cheaply unique-ify it by using
the ordering provided by the remote side, and then do a standard hash
join. That might well be a sensible approach, but the ORDER BY that
would make it correct doesn't show up in the Remote SQL. I don't know
why that's happening, but it's not good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
pg_sort_pd_v4.patch application/x-patch 16.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2015-10-16 18:19:16 Re: buildfarm failures on crake and sittella
Previous Message Jinyu Zhang 2015-10-16 18:00:13 Re: Patch: Optimize memory allocation in function 'bringetbitmap'