Re: Out of Memory errors are frustrating as heck!

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Gunther <raj(at)gusw(dot)net>, pgsql-performance(at)lists(dot)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Out of Memory errors are frustrating as heck!
Date: 2019-04-21 16:15:25
Message-ID: 20190421161434.4hedytsadpbnglgk@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Apr 21, 2019 at 10:36:43AM -0400, Tom Lane wrote:
>Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>> The growEnabled stuff only prevents infinite loops. It doesn't prevent
>> extreme silliness.
>
>> If a single 32 bit hash value has enough tuples by itself to not fit in
>> work_mem, then it will keep splitting until that value is in a batch by
>> itself before shutting off
>
>Right, that's the code's intention. If that's not good enough for this
>case, we'll need to understand the details a bit better before we can
>design a better(?) heuristic.
>

I think we only disable growing when there are no other values in the
batch, but that seems rather easy to defeat - all you need is a single
tuple with a hash that falls into the same batch, and it's over. Maybe
we should make this a bit less accurate - say, if less than 5% memory
gets freed, don't add more batches.

>I suspect, however, that we might be better off just taking the existence
>of the I/O buffers into account somehow while deciding whether it's worth
>growing further. That is, I'm imagining adding a second independent
>reason for shutting off growEnabled, along the lines of "increasing
>nbatch any further will require an unreasonable amount of buffer memory".
>The question then becomes how to define "unreasonable".
>

I think the question the code needs to be asking is "If we double the
number of batches, does the amount of memory we need drop?" And the
memory needs to account both for the buffers and per-batch data.

I don't think we can just stop increasing the number of batches when the
memory for BufFile exceeds work_mem, because that entirely ignores the
fact that by doing that we force the system to keep the per-batch stuff
in memory (and that can be almost arbitrary amount).

What I think we should be doing instead is instead make the threshold
dynamic - instead of just checking at work_mem, we need to increment the
number of batches when the total amount of memory exceeds

Max(work_mem, 3 * memory_for_buffiles)

This is based on the observation that by increasing the number of
batches, we double memory_for_buffiles and split the per-batch data in
half. By adding more batches, we'd actually increase the amount of
memory used.

Of course, this just stops enforcing work_mem at some point, but it at
least attempts to minimize the amount of memory used.

An alternative would be spilling the extra tuples into a special
overflow file, as I explained earlier. That would actually enforce
work_mem I think.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2019-04-21 16:18:24 Re: Backup and Restore (pg_dump & pg_restore)
Previous Message Tom Lane 2019-04-21 14:36:43 Re: Out of Memory errors are frustrating as heck!