Re: [HACKERS] Parallel Hash take II

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: [HACKERS] Parallel Hash take II
Date: 2017-11-15 21:06:23
Message-ID: CAEepm=2cJBHjX1=ajOseqRmwu=g00f3QQpZ0E9K5=SHwZJTgkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 16, 2017 at 7:57 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> The contrast with the situation with Thomas and his hash join patch is
> interesting. Hash join is *much* more sensitive to the availability of
> memory than a sort operation is.
>
>> I don't really have a good answer to "but what should we otherwise do",
>> but I'm doubtful this is quite the right answer.
>
> I think that the work_mem model should be replaced by something that
> centrally budgets memory. It would make sense to be less generous with
> sorts and more generous with hash joins when memory is in short
> supply, for example, and a model like this can make that possible. The
> work_mem model has always forced users to be far too conservative.
> Workloads are very complicated, and always having users target the
> worst case leaves a lot to be desired.

In the old days, Oracle had only simple per-operation memory limits
too, and that applied to every operation in every thread just like our
work_mem. It's interesting that they had separate knobs for sort and
hash though, and defaulted to giving hash twice as much.

With a whole-plan memory target, our planner would probably begin to
plan join order differently to minimise the number of hash tables in
memory at once, like other RDBMSs. Not sure how the plan-wide target
should work though -- try many plans, giving different portions of
budget to different subplans? That should work fine if you like
O(please-melt-my-computer), especially if combined with a similar
approach to choosing worker numbers. Some kind of feedback system?
Seems like a different kind of planner, but I have no clue. If you
have ideas/papers/references, it'd be great to see a new thread on
that subject.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2017-11-15 21:11:13 Re: pgsql: Add hooks for session start and session end
Previous Message Thomas Munro 2017-11-15 21:02:59 Re: [HACKERS] Parallel Hash take II