Re: Horizontal scalability/sharding

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Mason S <masonlists(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Horizontal scalability/sharding
Date: 2015-09-01 16:40:40
Message-ID: CA+TgmoYMFg5tA4Tr0giFWBYHswgTRrB40cK1Gva2CEFmCp6CTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 1, 2015 at 6:55 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I assumed these queries were going to be solved by sending as digested
> data as possible to the coordinator, and having the coordinator complete
> any remaining processing. I think we are going to need to decide if
> such "sending data back to shards" is something we are ever going to
> implement. I can see FDWs _not_ working well for that use-case.

I do think we are going to want to support that. All the people I've
talked to about parallel and distributed query processing agree that
you need to do that sort of thing to get really good and scalable
performance. I think that we could make a lot of headway as compared
with the status quo just by implementing more pushdown optimizations
than we have today. Right now, SELECT COUNT(*) FROM table will suck
back the whole remote table and count the rows locally, and that's
stupid. We can fix that case with better pushdown logic. We can also
fix the case of N-way join nests where the joins are either on the
partitioning key or to replicated tables. But suppose you have a join
between two tables which are sharded across the cluster but not on the
partitioning key. There's no way to push the join down, so all the
work comes back to the coordinator, which is possibly OK if such
queries are rare, but not so hot if they are frequent.

To leverage the full CPU power of the cluster in such a case, you need
to be able to shuffle data around between the nodes. You pick one of
the two tables being joined, and based on the partitioning key of that
table, each node examines the other table and, for each row, sends it
to the machine where it MIGHT have one or more join partners. Then
each node can join its shard of the first table against the rows from
the second table that were sent to it.

Now the question is, where should the code that does all of this live?
postgres_fdw? Some new, sharding-specific FDW? In core? I don't
know for sure, but what I do know is that we could make a lot of
progress over where we are today by just improving postgres_fdw, and I
don't think those improvements are even all that difficult. If we
decide we need to implement something new, it's going to be a huge
project that will take years to complete, with uncertain results. I'd
rather have a postgres_fdw-based implementation that is imperfect and
can't handle some kinds of queries in 9.6 than a promise that by 9.9
we'll have something really great that handles MPP perfectly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-09-01 17:00:06 Re: Horizontal scalability/sharding
Previous Message Joshua D. Drake 2015-09-01 16:40:18 Re: Horizontal scalability/sharding