FDW does not push down LIMIT & ORDER BY with sharding (partitions)

From: Gert van Dijk <gertvdijk(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: FDW does not push down LIMIT & ORDER BY with sharding (partitions)
Date: 2019-07-09 00:12:40
Message-ID: CAFT+aqL1Tt0qfYqjHH+shwPoW8qdFjpJ8vBR5ABoXJDUcHyN1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

First of all I want to thank Etsuro Fujita for implementing the exact
feature I was missing in
FDW 11.4, but now available in 12: pushing down of LIMIT & ORDER BY to
foreign tables
(commit d50d172e51). Now that I'm using PostgreSQL 12-beta2 from the
official Docker
image I noticed an omission that I wanted to report here which may be
relevant for those
like me, using FDW in a typical sharding setup.

By querying purely foreign tables, I can confirm pushing down LIMIT &
ORDER BY is
working as expected on my installation.
However, when I use a typical sharding setup where the main table is
located on the FDW
node, with partitions of foreign tables, this seems not to activate
the new code path. I can
understand that pushing this down is not possible in cases where
*multiple* foreign tables
are to be scanned. However, it also does not work in the case where my
WHERE clause
condition causes to only connect to a *single* foreign table.

Short version of my situation below.

Table definition, typical 'shard by user':
CREATE TABLE my_big_table (
user_id bigint NOT NULL,
[ omitted other columns for brevity ]
) PARTITION BY HASH (user_id) ;

create foreign table my_big_table_mod4_s0 partition of my_big_table
FOR VALUES WITH (MODULUS 4, REMAINDER 0) server shardA
OPTIONS (table_name 'my_big_table_mod4_s0');

Running

EXPLAIN VERBOSE
SELECT * from my_big_table
WHERE
user_id = 12345 -- only 1 user --> single foreign table.
ORDER BY serial DESC
LIMIT 10;

yields

Limit (cost=927393.08..927395.58 rows=1000 width=32)
Output: [...]
-> Sort (cost=927393.08..931177.06 rows=1513592 width=32)
Output: [...]
Sort Key: my_big_table_mod4_s0.serial DESC
-> Foreign Scan on public.my_big_table_mod4_s0
(cost=5318.35..844404.46 rows=1513592 width=32)
Output: [...]
Remote SQL: SELECT [...] FROM
public.my_big_table_mod4_s0 WHERE ((user_id = 4560084))

As you can see this is sub-optimal compared to the case where I
directly query the foreign
table.

This started as a Question on DBA.SE, some more information included there:
https://dba.stackexchange.com/q/242358/13155

Full version string used:
PostgreSQL 12beta2 (Debian 12~beta2-1.pgdg100+1) on
x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

Please let me know if I should provide more information or in what
other way I could
contribute. I'm very much willing to test patches.

(If this is already being worked on or discussed elsewhere on this or
another list, please
excuse me, it seems a bit hard to find relevant results searching the
mailing list archives,
and I'm fairly new to PostgreSQL in general too.)

Thanks,

Gert van Dijk

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2019-07-09 01:13:31 Re: BUG #15900: `executor could not find named tuplestore` in triggers with transition table and row locks
Previous Message Thomas Munro 2019-07-08 23:49:19 Re: BUG #15900: `executor could not find named tuplestore` in triggers with transition table and row locks