Getting sorted data from foreign server

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Getting sorted data from foreign server
Date: 2015-10-06 10:46:12
Message-ID: CAFjFpRd4kdRuP3opeGyzzeUygkZyY-JUCg45u7itgW0Nu-0M_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,
standard_qp_callback() sets root->query_pathkeys to pathkeys on which the
result of query_planner are expected be sorted upon (see the function for
more details). The patch checks if any prefix of these pathkeys can be
entirely evaluated using the foreign relation and at the foreign server
under consideration. If yes, it gets estimates of costs involved and adds
paths with those pathkeys. There can be multiple pathkeyless paths added
for a given base relation. For every such path one path with pathkeys is
added. If there is an index matching on the foreign server, getting the
data sorted from foreign server improves execution time as seen from the
results. The patch adds this functionality entirely in postgres_fdw.

For a postgres_fdw foreign table ft1(val int, val2 int), with the patch

EXPLAIN VERBOSE SELECT * FROM ft1 ORDER BY val; gives
QUERY PLAN
------------------------------------------------------------------------
Foreign Scan on public.ft1 (cost=100.29..6480.42 rows=100118 width=8)
Output: val, val2
Remote SQL: SELECT val, val2 FROM public.lt ORDER BY val ASC
(3 rows)

observe that the query sent to the foreign server has ORDER BY clause in
it. The test script attached has more examples of the same. The patch adds
a small test case.

Results
------------
Attached find the script used to measure the performance. The script
creates a foreign server and foreign table pointing to the local server and
local table resp. The test runs three different types of queries (simple
sort, group by, sorted result from inheritance hierarchy) multiple times
and calculates the average execution time for each query with and without
the patch. The performance is measured for foreign table (ft1 above)
populated with 100 rows (in-memory sorting) and with 100000 rows (external
sorting) resp. The output of the script with and without patch and with
different sizes of foreign table is attached here.

We can observe following
1. For large number of rows (when the memory is not enough to hold all the
data to be sorted) we see 20-25% reduction in the query execution time when
there is matching index on the foreign server.

2. For very small number of rows (when the memory is enough to hold all the
data to be sorted) there is not much performance gain and sometimes the
planner is not choosing the path with pathkeys for foreign scans.

3. In all the cases, the planning time increases owing to EXPLAIN queries
fired on the foreign server.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_sort_pd.patch text/x-patch 19.0 KB
sort_pd.sql text/x-sql 2.7 KB
sort_pd.without_patch_100_rows.out application/octet-stream 7.4 KB
sort_pd.without_patch_100000_rows.out application/octet-stream 7.9 KB
sort_pd.with_patch_100_rows.out application/octet-stream 7.2 KB
sort_pd.with_patch_100000_rows.out application/octet-stream 7.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Syed, Rahila 2015-10-06 10:47:17 Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message Syed, Rahila 2015-10-06 09:34:44 Re: [PROPOSAL] VACUUM Progress Checker.