Re: Parallel Hash take II

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Parallel Hash take II
Date: 2017-08-01 02:00:53
Message-ID: CA+TgmoYinb5M0f+mhbQw3DAXmnJYjNw5ZEiTO+XeUz=1RRzYhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 31, 2017 at 9:11 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> - Echoing concerns from other threads (Robert: ping): I'm doubtful that
> it makes sense to size the number of parallel workers solely based on
> the parallel scan node's size. I don't think it's this patch's job to
> change that, but to me it seriously amplifys that - I'd bet there's a
> lot of cases with nontrivial joins where the benefit from parallelism
> on the join level is bigger than on the scan level itself. And the
> number of rows in the upper nodes might also be bigger than on the
> scan node level, making it more important to have higher number of
> nodes.

Well, I feel like a broken record here but ... yeah, I agree we need
to improve that. It's probably generally true that the more parallel
operators we add, the more potential benefit there is in doing
something about that problem. But, like you say, not in this patch.

http://postgr.es/m/CA+TgmoYL-SQZ2gRL2DpenAzOBd5+SW30QB=A4CseWtOgejz4aQ@mail.gmail.com

I think we could improve things significantly by generating multiple
partial paths with different number of parallel workers, instead of
just picking a number of workers based on the table size and going
with it. For that to work, though, you'd need something built into
the costing to discourage picking paths with too many workers. And
you'd need to be OK with planning taking a lot longer when parallelism
is involved, because you'd be carrying around more paths for longer.
There are other problems to solve, too.

I still think, though, that it's highly worthwhile to get at least a
few more parallel operators - and this one in particular - done before
we attack that problem in earnest. Even with a dumb calculation of
the number of workers, this helps a lot.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-01 02:06:59 Re: Update description of \d[S+] in \?
Previous Message Robert Haas 2017-08-01 01:52:50 Re: Partitioning vs ON CONFLICT