From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Disk-based hash aggregate's cost model |
Date: | 2020-09-01 06:34:34 |
Message-ID: | 0d3a2b287401133e3c4c1404ea6b4f0d2209a3ee.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote:
> So I'm wondering if the hashagg is not ignoring similar non-I/O costs
> for the spilling case. In particular, the initial section computing
> startup_cost seems to ignore that we may need to so some of the stuff
> repeatedly - for example we'll repeat hash lookups for spilled
> tuples,
> and so on.
To fix that, we'd also need to change the cost of in-memory HashAgg,
right?
> The other thing is that sort seems to be doing only about half the
> physical I/O (as measured by iosnoop) compared to hashagg, even
> though
> the estimates of pages / input_bytes are exactly the same. For
> hashagg
> the iosnoop shows 5921MB reads and 7185MB writes, while sort only
> does
> 2895MB reads and 3655MB writes. Which kinda matches the observed
> sizes
> of temp files in the two cases, so the input_bytes for sort seems to
> be
> a bit overestimated.
Hmm, interesting.
How reasonable is it to be making these kinds of changes to the cost
model right now? I think your analysis is solid, but I'm worried about
making more intrusive changes very late in the cycle.
I had originally tried to limit the cost model changes to the new plans
I am introducing -- that is, HashAgg plans expected to require disk.
That's why I came up with a rather arbitrary penalty.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Bernd Helmle | 2020-09-01 08:54:52 | Re: Documentation patch for backup manifests in protocol.sgml |
Previous Message | Jeff Davis | 2020-09-01 06:18:39 | Reloptions for table access methods |