Re: WIP: [[Parallel] Shared] Hash

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: [[Parallel] Shared] Hash
Date: 2017-02-01 04:43:40
Message-ID: CAFjFpReeHV+nyroU11M98yBRB5yzcsvs0ppr5mdJv6nbewTfsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>> However, ExecHashIncreaseNumBatches() may change the
>> number of buckets; the patch does not seem to account for spaceUsed changes
>> because of that.
>
> That's what this hunk is intended to do:
>
> @@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
> TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets,
>
> hashtable->nbuckets_optimal);
>
> + /* account for the increase in space that will be used by buckets */
> + hashtable->spaceUsed += sizeof(HashJoinTuple) *
> + (hashtable->nbuckets_optimal - hashtable->nbuckets);
> + if (hashtable->spaceUsed > hashtable->spacePeak)
> + hashtable->spacePeak = hashtable->spaceUsed;
> +

Sorry, I missed that hunk. You are right, it's getting accounted for.

>>
>> In ExecHashTableReset(), do we want to update spacePeak while setting
>> spaceUsed.
>
> I figured there was no way that the new spaceUsed value could be
> bigger than spacePeak, because we threw out all chunks and have just
> the bucket array, and we had that number of buckets before, so
> spacePeak must at least have been set to a number >= this number
> either when we expanded to this many buckets, or when we created the
> hashtable in the first place. Perhaps I should
> Assert(hashtable->spaceUsed <= hashtable->spacePeak).

That would help, better if you explain this with a comment before Assert.

>
>> While this patch tracks space usage more accurately, I am afraid we might be
>> overdoing it; a reason why we don't track space usage accurately now. But I
>> think I will leave it to be judged by someone who is more familiar with the
>> code and possibly has historical perspective.
>
> Well it's not doing more work; it doesn't make any practical
> difference whatsoever but it's technically doing less work than
> master, by doing memory accounting only when acquiring a new 32KB
> chunk.

This patch does more work while counting the space used by buckets, I
guess. AFAIU, right now, that happens only after the hash table is
built completely. But that's fine. I am not worried about whether the
it's less work or more.

> But if by overdoing it you mean that no one really cares about
> the tiny increase in accuracy so the patch on its own is a bit of a
> waste of time, you're probably right.

This is what I meant by overdoing; you have spelled it better.

> Depending on tuple size, you
> could imagine something like 64 bytes of header and unused space per
> 32KB chunk that we're not accounting for, and that's only 0.2%. So I
> probably wouldn't propose this refactoring just on accuracy grounds
> alone.
>
> This refactoring is intended to pave the way for shared memory
> accounting in the later patches, which would otherwise generate ugly
> IPC if done for every time a tuple is allocated. I considered using
> atomic add to count space per tuple, or maintaining per-backend local
> subtotals and periodically summing. Then I realised that switching to
> per-chunk accounting would fix the IPC problem AND be justifiable on
> theoretical grounds. When we allocate a new 32KB chunk, we really are
> using 32KB more of your memory.

+1.

Thanks for considering the comments.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-02-01 04:48:22 Re: postgres_fdw bug in 9.6
Previous Message Michael Paquier 2017-02-01 04:42:40 Re: Reporting planning time with EXPLAIN