Quick Links

Re: Large Scale Aggregation (HashAgg Enhancement)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Rod Taylor <pg(at)rbt(dot)ca>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Large Scale Aggregation (HashAgg Enhancement)
Date:	2006-01-17 01:02:53
Message-ID:	22908.1137459773@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Sure hash table is dynamic, but we read all inner rows to create the
> hash table (nodeHash) before we get the outer rows (nodeHJ).

But our idea of the number of batches needed can change during that
process, resulting in some inner tuples being initially assigned to the
wrong temp file. This would also be true for hashagg.

> Why would we continue to dynamically build the hash table after the
> start of the outer scan?

The number of tuples written to a temp file might exceed what we want to
hold in memory; we won't detect this until the batch is read back in,
and in that case we have to split the batch at that time. For hashagg
this point would apply to the aggregate states not the input tuples, but
it's still a live problem (especially if the aggregate states aren't
fixed-size values ... consider a "concat" aggregate for instance).

regards, tom lane

In response to

Re: Large Scale Aggregation (HashAgg Enhancement) at 2006-01-17 00:54:47 from Simon Riggs

Responses

Re: Large Scale Aggregation (HashAgg Enhancement) at 2006-01-17 05:01:38 from Greg Stark
Re: Large Scale Aggregation (HashAgg Enhancement) at 2006-01-17 07:05:29 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Larry Rosenman	2006-01-17 03:35:40	FW: PGBuildfarm member firefly Branch HEAD Failed at Stage Check
Previous Message	Simon Riggs	2006-01-17 00:54:47	Re: Large Scale Aggregation (HashAgg Enhancement)