Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

From: "Bryce Cutt" <pandasuit(at)gmail(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, "Joshua Tolley" <eggyknap(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Date: 2009-01-07 00:12:12
Message-ID: 1924d1180901061612s5adb40edwa0fe53b2d9ea12c5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The latest version of the patch is attached. The revision considerably
cleans up the code, especially variable naming consistency. We have
adopted the use of IM (in-memory) in variable names for the hash table
structures as suggested.

Two other implementations changes:

1) The overhead of the hash table has been reduced by allocating an
array of pointers instead of an array of structs and only allocating the
structs as they are needed to store MCVs. IM buckets are now frozen
by first removing all tuples then deleting the struct from memory. This
allows more memory to be freed as well as the removal of the frozen
field in the IM bucket struct which now makes that struct only 8 bytes
on a 32bit machine. If for some reason all IM buckets are frozen all
IM struct overhead is removed from memory to further reduce the memory
footprint.

2) This patch supports using a set percentage of work_mem (currently 2%)
to store the build tuples that join frequently with probe relation
tuples. The code only allocates MCVs up to the maximum amount and will
flush from the in-memory hash table if the memory is ever exceeded. The
code also ensures that the overall join memory used (the MCV hash table
and batch 0 in memory) does not exceed spaceAllocated as usual. If this 2%
of memory is not used by the MCV hash table then it can be used by batch 0.

These changes are mostly relate to style, although some of the cleanup
has made the code slightly faster.

We would really appreciate help on finalizing this patch, especially in
regard to style issues. Thank you for all the help.

- Dr. Ramon Lawrence and Bryce Cutt

On Sun, Jan 4, 2009 at 6:48 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> 1) Isn't ExecHashFreezeNextMCVPartition actually a most common TUPLE
>> partition, rather than a most common VALUE partition (and similarly for
>> ExecHashGetMCVPartition)?
>>
>> A partition stores all tuples that correspond to that MCV value. It is
>> usually one for foreign key joins but may be more than one. (Plus, it
>> may store other tuples that have the same hash value for the join
>> attribute as the MCV value.)
>
> I guess my point is - check that your variable/function/structure
> member naming is consistent between different parts of the code. The
> ExecHashGetMCVPartition function accesses structure members called
> nMostCommonTuplePartitionHashBuckets, nMostCommonTuplePartition, and
> mostCommonTuplePartition. It seems inconsistent that the function
> name uses MCVPartition and
> the structure members use mostCommonTuplePartition - aren't we talking
> about the same thing in both cases?
>
> And, more to the point, the terminology just seems wrong to me, the
> more I think about it. I mean, ExecHashGetMCVParitition is not
> finding a partition of the MCVs. It's finding a partition of an
> in-memory hash table which we plan to populate with MCVs. That's why
> I'm wondering if we should make it ExecHashGetIMPartition,
> nIMPartitionHashBuckets, etc.
>
>> 2) Have you done any performance testing on the impact of this change?
>>
>> Yes, the ability to use MCVs for more than sequential scans
>> significantly improves performance in multi-join cases. The allocation
>> of a percentage of memory of only 1% will not affect any performance
>> results as all our testing was done with the MCV value of 10 or 100
>> which is significantly below a 1% allocation of work_mem. If anything,
>> performance would be improved when using more MCVs.
>
> That is a very good thing.
>
>> Finally, any help you can provide on style concerns to make this easier
>> to commit would be appreciated. We will put all the effort required
>> over the next few days to get this into 8.4.
>
> If I have time, I might be willing to make a style run over the next
> version of the patch after you post it to the list, and just correct
> anything I see and repost. This might be faster than sending comments
> back and forth, if you are OK with it. I have a day job so this would
> probably need to be Tuesday or Wednesday night. My main advice is
> "read the diff before you post it". Sometimes things will just pop
> out at you that are less obvious when you are head-down in the code.
>
> Random stuff I notice in v4 patch: make sure all lines fit in 80
> columns (except for long error messages if any), missing space before
> closing comment delimiter in ExecHashGetMCVPartition, extraneous blank
> line added to nodeHash.c just before the comment that says "and remove
> from hash table", comment in ExecHashJoinGetMostCommonValues just
> after the get_attstatsslot call is formatted strangely, still extra
> curly braces around the calls to
> ExecScanHashMostCommonValuePartition/ExecScanHashBucket.
>
> ...Robert
>

Attachment Content-Type Size
histojoin_v5.patch application/octet-stream 24.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-01-07 00:13:24 Re: Re: [COMMITTERS] pgsql: This makes all the \dX commands (most importantly to most: \df)
Previous Message Joshua D. Drake 2009-01-07 00:08:57 Re: Re: [COMMITTERS] pgsql: This makes all the \dX commands (most importantly to most: \df)