Re: WIP: [[Parallel] Shared] Hash

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: [[Parallel] Shared] Hash
Date: 2017-01-31 23:25:18
Message-ID: CAEepm=1PhZD0xhhmYKgSk7948K18JkRwCUy2uFtjJYiUzXjnWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 1, 2017 at 2:10 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>
>> 0003-hj-refactor-memory-accounting-v4.patch:
>> [...]
>>
> I looked at this patch. I agree that it accounts the memory usage more
> accurately. Here are few comments.

Thanks for the review!

> spaceUsed is defined with comment
> Size spaceUsed; /* memory space currently used by tuples */
>
> In ExecHashTableCreate(), although the space is allocated for buckets, no
> tuples are yet inserted, so no space is used by the tuples, so going strictly
> by the comment, spaceUsed should be 0 in that function. But I think the patch
> is accounting the spaceUsed more accurately. Without this patch, the actual
> allocation might cross spaceAllowed without being noticed. With this patch
> that's not possible. Probably we should change the comment to say memory space
> currently allocated.

Right, that comment is out of date. It is now the space used by the
bucket array and the tuples. I will fix that in the next version.

> However, ExecHashIncreaseNumBatches() may change the
> number of buckets; the patch does not seem to account for spaceUsed changes
> because of that.

That's what this hunk is intended to do:

@@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets,

hashtable->nbuckets_optimal);

+ /* account for the increase in space that will be used by buckets */
+ hashtable->spaceUsed += sizeof(HashJoinTuple) *
+ (hashtable->nbuckets_optimal - hashtable->nbuckets);
+ if (hashtable->spaceUsed > hashtable->spacePeak)
+ hashtable->spacePeak = hashtable->spaceUsed;
+
hashtable->nbuckets = hashtable->nbuckets_optimal;
hashtable->log2_nbuckets = hashtable->log2_nbuckets_optimal;

It knows that spaceUsed already includes the old bucket array
(nbuckets), so it figures out how much bigger the new bucket array
will be (nbuckets_optimal - nbuckets) and adds that.

> Without this patch ExecHashTableInsert() used to account for the space used by
> a single tuple inserted. The patch moves this calculation in dense_alloc() and
> accounts for out-of-bound allocation for larger tuples. That's good.
>
> The change in ExecChooseHashTableSize() too looks fine.
>
> In ExecHashTableReset(), do we want to update spacePeak while setting
> spaceUsed.

I figured there was no way that the new spaceUsed value could be
bigger than spacePeak, because we threw out all chunks and have just
the bucket array, and we had that number of buckets before, so
spacePeak must at least have been set to a number >= this number
either when we expanded to this many buckets, or when we created the
hashtable in the first place. Perhaps I should
Assert(hashtable->spaceUsed <= hashtable->spacePeak).

> While this patch tracks space usage more accurately, I am afraid we might be
> overdoing it; a reason why we don't track space usage accurately now. But I
> think I will leave it to be judged by someone who is more familiar with the
> code and possibly has historical perspective.

Well it's not doing more work; it doesn't make any practical
difference whatsoever but it's technically doing less work than
master, by doing memory accounting only when acquiring a new 32KB
chunk. But if by overdoing it you mean that no one really cares about
the tiny increase in accuracy so the patch on its own is a bit of a
waste of time, you're probably right. Depending on tuple size, you
could imagine something like 64 bytes of header and unused space per
32KB chunk that we're not accounting for, and that's only 0.2%. So I
probably wouldn't propose this refactoring just on accuracy grounds
alone.

This refactoring is intended to pave the way for shared memory
accounting in the later patches, which would otherwise generate ugly
IPC if done for every time a tuple is allocated. I considered using
atomic add to count space per tuple, or maintaining per-backend local
subtotals and periodically summing. Then I realised that switching to
per-chunk accounting would fix the IPC problem AND be justifiable on
theoretical grounds. When we allocate a new 32KB chunk, we really are
using 32KB more of your memory.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-02-01 00:36:53 Re: Patch: Write Amplification Reduction Method (WARM)
Previous Message Andres Freund 2017-01-31 22:51:01 Re: Patch: Write Amplification Reduction Method (WARM)