Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

From: James Coleman <jtc331(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Date: 2019-11-10 14:15:06
Message-ID: CAAaqYe9sMXm8e2+9n6KK2OTMdH3wBku06cpTcchu7k2wX2dAqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Nov 9, 2019 at 10:44 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Sun, Nov 10, 2019 at 3:25 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> > So I should have run the earlier attached plan with VERBOSE, but
> > here's the interesting thing: the parallel hash node's seq scan node
> > outputs two columns: let's call them (from the redacted plan)
> > items.system_id and items.account_id. The first (system_id) is both
> > not null and unique; the second (account_id) definitely has massive
> > skew. I'm not very up-to-speed on how the hash building works, but I
> > would have (perhaps naïvely?) assumed that the first column being
> > unique would make the hash keys very likely not to collide in any
> > significantly skewed way. Am I missing something here?
>
> Hrm. So the compound key is unique then. I was assuming up until now
> that it had duplicates. The hashes of the individual keys are
> combined (see ExecHashGetHashValue()), so assuming there is nothing
> funky about the way citext gets hashed (and it's not jumping out at
> me), your unique keys should give you uniform hash values and thus
> partition size, and repartitioning should be an effective way of
> reducing hash table size. So now it sounds like you have a simple
> case of underestimation, but now I'm confused about how you got a
> 344MB hash table with work_mem = 150MB:

Looking at the source, citext's hash is a pretty standard call to
hash_any, so I don't see how that would lead to any oddities (barring
an intentional hash collision etc., but this is real data).

Do you have any theories about where the underestimation is happening?
It knows the number of rows reasonably well. The one thing I'm not
sure about yet is the row width = 16. The account_id is a bigint, so
we can assume it knows the size properly there. The system_id being
citext...I think the 8 bytes it has for that is probably a bit low on
average, but I'm not sure yet by how much (I'm going to run a query to
find out). Hmm. Query came back, and average length is just shy of
7.9...so the 16 byte row size is looking pretty good. So I'm at a loss
of why/what it would be underestimating (does it know about
uniqueness/non-uniqueness? could that be a factor?)?

I also don't know why it seems to regularly fail on the primary, but
not on the sync, unless we adjust the work_mem up. I've double-checked
all GUCs and the only differences are things related to replication
that you'd expect to be different on primary/replica.

I know there are some things that execute differently on replicas, so
I assume it's something like that, but I don't know what specifically
would cause this here.

> Buckets: 4194304 (originally 4194304) Batches: 32768 (originally
> 4096) Memory Usage: 344448kB
>
> And I'm confused about what was different when it wanted the crazy
> number of batches.

I'm not quite sure how to find out; if you have any ideas, I'd love to
hear them. The one thing I can think of to try is to slowly increase
work_mem (since setting it to 500MB reproduced the error on the
replica) and see if the bucket info starts to trend up.

James

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2019-11-10 17:29:25 Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Previous Message vignesh C 2019-11-10 11:18:01 Re: Reorderbuffer crash during recovery