|From:||Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>|
|To:||Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>|
|Cc:||PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: Partitioning and postgres_fdw optimisations for multi-tenancy|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2020-07-14 15:27, Ashutosh Bapat wrote:
> On Tue, Jul 14, 2020 at 12:48 AM Alexey Kondratov
> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>> I built a simple two node multi-tenant schema for tests, which can be
>> easily set up with attached scripts. It creates three tables
>> users, documents) distributed over two nodes. Everything can be found
>> this Gist  as well.
>> Some real-life test queries show, that all single-node queries aren't
>> pushed-down to the required node. For example:
>> INNER JOIN users ON documents.user_id = users.id
>> documents.company_id = 5
>> AND users.company_id = 5;
> There are a couple of things happening here
> 1. the clauses on company_id in WHERE clause are causing partition
> pruning. Partition-wise join is disabled with partition pruning before
> PG13. In PG13 we have added advanced partition matching algorithm
> which will allow partition-wise join with partition pruning.
I forgot to mention that I use a recent master (991c444e7a) for tests
enable_partitionwise_join = 'on'
enable_partitionwise_aggregate = 'on'
of course. I've also tried postgres_fdw.use_remote_estimate = true
followed by ANALYSE on both nodes (it is still used in setup.sh script).
BTW, can you, please, share a link to commit / thread about allowing
partition-wise join and partition pruning to work together in PG13?
> 2. the query has no equality condition on the partition key of the
> tables being joined. Partitionwise join is possible only when there's
> an equality condition on the partition keys (company_id) of the
> joining tables. PostgreSQL's optimizer is not smart enough to convert
> the equality conditions in WHERE clause into equality conditions on
> partition keys. So having those conditions just in WHERE clause does
> not help. Instead please add equality conditions on partition keys in
> JOIN .. ON clause or WHERE clause (only for INNER join).
With adding documents.company_id = users.company_id
INNER JOIN users ON (documents.company_id = users.company_id
AND documents.user_id = users.id)
documents.company_id = 5
AND users.company_id = 5;
query plan remains the same.
>> executed as following
>> QUERY PLAN
>> Nested Loop
>> Join Filter: (documents.user_id = users.id)
>> -> Foreign Scan on users_node2 users
>> -> Materialize
>> -> Foreign Scan on documents_node2 documents
>> i.e. it uses two foreign scans and does the final join locally.
>> once I specify target partitions explicitly, then the entire query is
>> pushed down to the foreign node:
>> QUERY PLAN
>> Foreign Scan
>> Relations: (documents_node2) INNER JOIN (users_node2)
>> Execution time is dropped significantly as well — by more than 3 times
>> even for this small test database. Situation for simple queries with
>> aggregates or joins and aggregates followed by the sharding key filter
>> is the same. Something similar was briefly discussed in this thread
>> IIUC, it means that push-down of queries through the postgres_fdw
>> perfectly well, the problem is with partition-wise operation detection
>> at the planning time. Currently, partition-wise aggregate routines,
>> e.g., looks for a GROUP BY and checks whether sharding key exists
>> or not. After that PARTITIONWISE_AGGREGATE_* flag is set. However, it
>> doesn't look for a content of WHERE clause, so frankly speaking it
>> a problem, this functionality is not yet implemented.
>> Actually, sometimes I was able to push down queries with aggregate
>> simply by adding an additional GROUP BY with sharding key, like this:
>> company_id = 5
>> GROUP BY company_id;
> This gets pushed down since GROUP BY clause is on the partition key.
Sure, but it only works *sometimes*, I've never seen most of such simple
queries with aggregates to be pushed down, e.g.:
company_id = 5
whether 'GROUP BY company_id' is used or not.
>> Although it seems that it will be easier to start with aggregates,
>> probably we should initially plan a more general solution? For
>> check that all involved tables are filtered by partitioning key and
>> down the entire query if all of them target the same foreign server.
>> Any thoughts?
> I think adding just equality conditions on the partition key will be
> enough. No need for any code change.
So, it hasn't helped. Maybe I could modify some costs to verify that
push-down of such joins is ever possible?
Anyway, what about aggregates? Partition-wise aggregates work fine for
but once I narrow it to a single partition with 'WHERE company_id = 5',
then it is being executed in a very inefficient way — takes all rows
from remote partition / node and performs aggregate locally. It doesn't
seem like a problem with query itself.
In my experience, both partition-wise joins and aggregates work well
with simple GROUP or JOIN by the partitioning key, which corresponds to
massive multi-partition OLAP queries. However, both stop working for a
single-partition queries with WHERE, when postgres_fdw and partitioning
are used. I'd be glad if you share any new guesses of how to make them
working without code modification.
Postgres Professional https://www.postgrespro.com
Russian Postgres Company
|Next Message||David G. Johnston||2020-07-14 16:01:24||Re: DROP relation IF EXISTS Docs and Tests - Bug Fix|
|Previous Message||Stephen Frost||2020-07-14 14:36:25||Re: recovering from "found xmin ... from before relfrozenxid ..."|