Re: DBT-3 with SF=20 got failed

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: DBT-3 with SF=20 got failed
Date: 2015-08-20 20:12:53
Message-ID: 55D634C5.8030006@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 08/19/2015 03:53 PM, Tom Lane wrote:
>
> I don't see anything very wrong with constraining the initial
> allocation to 1GB, or even less. That will prevent consuming insane
> amounts of work_mem when the planner's rows estimate is too high
> rather than too low. And we do have the ability to increase the hash
> table size on the fly.

Perhaps. Limiting the initial allocation to 1GB might help with lowering
memory consumed in case of over-estimation, and it's high enough so that
regular queries don't run into that.

My initial thought was "if the memory consumption is a problem, lower
the work_mem" but after thinking about it for a while I don't see a
reason to limit the memory consumption this way, if possible.

But what is the impact on queries that actually need more than 1GB of
buckets? I assume we'd only limit the initial allocation and still allow
the resize based on the actual data (i.e. the 9.5 improvement), so the
queries would start with 1GB and then resize once finding out the
optimal size (as done in 9.5). The resize is not very expensive, but
it's not free either, and with so many tuples (requiring more than 1GB
of buckets, i.e. ~130M tuples) it's probably just a noise in the total
query runtime. But I'd be nice to see some proofs of that ...

Also, why 1GB and not 512MB (which is effectively the current limit on
9.4, because we can't allocate 1GB there so we end up with 1/2 of that)?
Why not some small percentage of work_mem, e.g. 5%?

Most importantly, this is mostly orthogonal to the original bug report.
Even if we do this, we still have to fix the palloc after the resize.

So I'd say let's do a minimal bugfix of the actual issue, and then
perhaps improve the behavior in case of significant overestimates. The
bugfix should happen ASAP so that it gets into 9.5.0 (and backported).
We already have patches for that.

Limiting the initial allocation deserves more thorough discussion and
testing, and I'd argue it's 9.6 material at this point.

> The real problem here is the potential integer overflow in
> ExecChooseHashTableSize. Now that we know there is one, that should
> be fixed (and not only in HEAD/9.5).

I don't think there's any integer overflow. The problem is that we end
up with nbuckets so high that (nbuckets*8) overflows 1GB-1.

There's a protection against integer overflow in place (it was there for
a long time), but there never was any protection against the 1GB limit.
Because we've been using much smaller work_mem values and
NTUP_PER_BUCKET=10, so we could not actually reach it.

> But I believe Kaigai-san withdrew his initial proposed patch, and we
> don't have a replacementas far as I saw.

I believe the patch proposed by KaiGai-san is the right one to fix the
bug discussed in this thread. My understanding is KaiGai-san withdrew
the patch as he wants to extend it to address the over-estimation issue.

I don't think we should do that - IMHO that's an unrelated improvement
and should be addressed in a separate patch.

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-08-20 20:16:30 Re: Using quicksort for every external sort run
Previous Message Andres Freund 2015-08-20 19:50:43 Re: allowing wal_level change at run time