Re: Default setting for enable_hashagg_disk

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Default setting for enable_hashagg_disk
Date: 2020-06-25 23:11:54
Message-ID: 20200625231154.uknvrnqwmahv6rtu@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Thu, Jun 25, 2020 at 02:28:02PM -0700, Jeff Davis wrote:
>On Thu, 2020-06-25 at 15:56 -0400, Bruce Momjian wrote:
>> It is my understanding that spill of sorts is mostly read
>> sequentially,
>> while hash reads are random. Is that right? Is that not being
>> costed
>> properly?
>
>I don't think there's a major problem with the cost model, but it could
>probably use some tweaking.
>
>Hash writes are random. The hash reads should be mostly sequential (for
>large partitions it will be 128-block extents, or 1MB). The cost model
>assumes 50% sequential and 50% random.
>

The important bit here is that while the logical writes are random,
those are effectively combined in page cache and the physical writes are
pretty sequential. So I think the cost model is fairly reasonable.

Note: Judging by iosnoop stats shared in the thread linked by Jeff.

>Sorts are written sequentially and read randomly, but there's
>prefetching to keep the reads from being too random. The cost model
>assumes 75% sequential and 25% random.
>
>Overall, the IO pattern is better for Sort, but not dramatically so.
>Tomas Vondra did some nice analysis here:
>
>
>https://www.postgresql.org/message-id/20200525021045.dilgcsmgiu4l5jpa@development
>
>That resulted in getting the prealloc and projection patches in.
>
>Regards,
> Jeff Davis
>
>

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Tomas Vondra 2020-06-25 23:18:31 Re: Default setting for enable_hashagg_disk
Previous Message Alvaro Herrera 2020-06-25 22:58:53 Re: Default setting for enable_hashagg_disk

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-06-25 23:18:31 Re: Default setting for enable_hashagg_disk
Previous Message Alvaro Herrera 2020-06-25 22:58:53 Re: Default setting for enable_hashagg_disk