Re: BUG #15225: [XX000] ERROR: invalid DSA memory alloc request size 1073741824 / Where: parallel worker

From: Frits Jalvingh <jal(at)etc(dot)to>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15225: [XX000] ERROR: invalid DSA memory alloc request size 1073741824 / Where: parallel worker
Date: 2018-06-06 10:15:25
Message-ID: CAKhTGFWcOeOatgjqdYZRr9qfxFQ8vauzdw-42rd+gm2S-PdZUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Thomas,

I know that the message is about running out of memory, I just meant to say
that it is a different message than before. Which led me to think that this
might be another issue. But of course OOM can occur at many places and that
might explain the different messages.

As far as that explain thing goes, the odd thing seems to be that /without/
the explain the database uses a non parallel plan. If I execute the exact
same statement without explain I see one postgres process running at 100%
for a long time, then it produces its output proper.

If I add the explain part to it I see three processes: the main process
handling the EXPLAIN and two parallel workers. So the issue is that the
explain plan actually uses another plan than the same statement without
explain(!). This also explains the OOM, because indeed I see all processes
gobble up memory like mad, growing to 12GB each and then it dies.

Reducing work_mem to 512MB makes the explain plan succeed, finally.

So there seem to be two conclusions:
- explain seems to somehow influence the plan that is being made.

- there is a big problem with postgres memory assignment, because with the
increase of parallelism having work_mem work per process means that you
cannot effectively control the amount of memory that postgres uses anymore.
This was quite bad before, but because there were only limited places where
work_mem had effect this was... workable... But with more and more
parallelism thrown into the mix the net effect is unstable execution as a
random set of queries fired to the database will abort the backends at
random with OOM. That, I think, is a Bad Thing. The only way to prevent
backends aborting with oom would be to have a lot of memory but set
work_mem small - meaning you waste all that memory about 99% of the time....

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2018-06-06 14:38:13 Re: BUG #15225: [XX000] ERROR: invalid DSA memory alloc request size 1073741824 / Where: parallel worker
Previous Message Andreas Seltenreich 2018-06-06 08:38:01 Unnecessarily imprecise stats for very small tables leading to bad plans