From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Number of buckets in a hash join |
Date: | 2013-01-28 16:58:05 |
Message-ID: | 17849.1359392285@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> The first question is, why do we aim at 10 tuples per bucket?
I see nothing particularly wrong with that. The problem here is with
having 1000 tuples per bucket.
> Ideally, the planner would always make a good guess the number of rows,
> but for the situations that it doesn't, it would be good if the hash
> table was enlarged if it becomes too full.
Yeah, possibly. The proposed test case actually doesn't behave very
badly if work_mem is small, because there is logic in there to adjust
the number of batches. You didn't say what work_mem you're testing at,
but it's clearly more than the default 1MB. I think the issue arises if
the initial estimate of hashtable size is a good bit less than work_mem,
so the number of buckets is set to something a good bit less than what
would be optimal if we're using more of work_mem. This seems a little
reminiscent of what we did recently in tuplesort to make better use of
work_mem --- in both cases we have to choose a pointer-array size that
will make best use of work_mem after the tuples themselves are added.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2013-01-28 17:29:42 | Re: in-catalog Extension Scripts and Control parameters (templates?) |
Previous Message | Phil Sorber | 2013-01-28 16:25:00 | Re: [PATCH] pg_isready (was: [WIP] pg_ping utility) |