Re: Disk-based hash aggregate's cost model

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disk-based hash aggregate's cost model
Date: 2020-09-04 18:31:36
Message-ID: 011877614fa1279c97ce6e897ea2f0dc90124483.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote:
> Those charts show that the CP_SMALL_TLIST resulted in smaller temp
> files
> (per EXPLAIN ANALYZE the difference is ~25%) and also lower query
> durations (also in the ~25% range).

I was able to reproduce the problem, thank you.

Only two attributes are needed, so the CP_SMALL_TLIST projected schema
only needs a single-byte null bitmap.

But if just setting the attributes to NULL rather than projecting them,
the null bitmap size is based on all 16 attributes, bumping the bitmap
size to two bytes.

MAXALIGN(23 + 1) = 24
MAXALIGN(23 + 2) = 32

I think that explains it. It's not ideal, but projection has a cost as
well, so I don't think we necessarily need to do something here.

If we are motivated to improve this in v14, we could potentially have a
different schema for spilled tuples, and perform real projection at
spill time. But I don't know if that's worth the extra complexity.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-09-04 18:53:04 Re: Improving connection scalability: GetSnapshotData()
Previous Message Alexey Kondratov 2020-09-04 18:31:14 Re: Global snapshots