Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Joshua Tolley" <eggyknap(at)gmail(dot)com>
Cc: "Bryce Cutt" <pandasuit(at)gmail(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date: 2008-12-23 15:14:29
Message-ID: 603c8f070812230714k47a71309vc771413c50fe52ee@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> It's equivalent to our assumption that distributions of values in
> columns in the same table are independent. Making that assumption in
> this case would probably result in occasional dramatic speed
> improvements similar to the ones we've seen in less complex joins,
> offset by just-as-occasional dramatic slowdowns of similar magnitude. In
> other words, it will increase the variance of our results.

Under what circumstances do you think that it would produce a dramatic
slowdown? I'm confused. I thought the penalty for picking a bad set
of values for the in-memory hash table was pretty small.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2008-12-23 15:14:36 Re: incoherent view of serializable transactions
Previous Message Emmanuel Cecchet 2008-12-23 14:59:30 Re: incoherent view of serializable transactions