|From:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|To:||Jeff Davis <pgsql(at)j-davis(dot)com>|
|Cc:||Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: Disk-based hash aggregate's cost model|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Mon, Aug 31, 2020 at 11:34:34PM -0700, Jeff Davis wrote:
>On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote:
>> So I'm wondering if the hashagg is not ignoring similar non-I/O costs
>> for the spilling case. In particular, the initial section computing
>> startup_cost seems to ignore that we may need to so some of the stuff
>> repeatedly - for example we'll repeat hash lookups for spilled
>> and so on.
>To fix that, we'd also need to change the cost of in-memory HashAgg,
Why? I don't think we need to change costing of in-memory HashAgg. My
assumption was we'd only tweak startup_cost for cases with spilling by
adding something like (cpu_operator_cost * npartitions * ntuples).
>> The other thing is that sort seems to be doing only about half the
>> physical I/O (as measured by iosnoop) compared to hashagg, even
>> the estimates of pages / input_bytes are exactly the same. For
>> the iosnoop shows 5921MB reads and 7185MB writes, while sort only
>> 2895MB reads and 3655MB writes. Which kinda matches the observed
>> of temp files in the two cases, so the input_bytes for sort seems to
>> a bit overestimated.
FWIW I suspect some of this difference may be due to logical vs.
physical I/O. iosnoop only tracks physical I/O sent to the device, but
maybe we do much more logical I/O and it simply does not expire from
page cache for the sort. It might behave differently for larger data
set, longer query, ...
>How reasonable is it to be making these kinds of changes to the cost
>model right now? I think your analysis is solid, but I'm worried about
>making more intrusive changes very late in the cycle.
>I had originally tried to limit the cost model changes to the new plans
>I am introducing -- that is, HashAgg plans expected to require disk.
>That's why I came up with a rather arbitrary penalty.
I don't know. I certainly understand the desire not to change things
this late. OTOH I'm worried that we'll end up receiving a lot of poor
plans post release.
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Next Message||Junfeng Yang||2020-09-01 09:20:00||回复: Is it possible to set end-of-data marker for COPY statement.|
|Previous Message||Masahiko Sawada||2020-09-01 09:16:06||Re: recovering from "found xmin ... from before relfrozenxid ..."|