Re: pgcon unconference / impact of block size on performance

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: pgcon unconference / impact of block size on performance
Date: 2022-06-07 14:00:18
Message-ID: 31c3f2cd-5ce9-6130-4c06-2700fad0a970@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/7/22 15:48, Jakub Wartak wrote:
> Hi,
>
>> The really
>> puzzling thing is why is the filesystem so much slower for smaller pages. I mean,
>> why would writing 1K be 1/3 of writing 4K?
>> Why would a filesystem have such effect?
>
> Ha! I don't care at this point as 1 or 2kB seems too small to handle many real world scenarios ;)
>

I think that's not quite true - a lot of OLTP works with fairly narrow
rows, and if they use more data, it's probably in TOAST, so again split
into smaller rows. It's true smaller pages would cut some of the limits
(columns, index tuple, ...) of course, and that might be an issue.

Independently of that, it seems like an interesting behavior and it
might tell us something about how to optimize for larger pages.

>>> b) Another thing that you could also include in testing is that I've spotted a
>> couple of times single-threaded fio might could be limiting factor (numjobs=1 by
>> default), so I've tried with numjobs=2,group_reporting=1 and got this below
>> ouput on ext4 defaults even while dropping caches (echo 3) each loop iteration -
>> - something that I cannot explain (ext4 direct I/O caching effect? how's that
>> even possible? reproduced several times even with numjobs=1) - the point being
>> 206643 1kb IOPS @ ext4 direct-io > 131783 1kB IOPS @ raw, smells like some
>> caching effect because for randwrite it does not happen. I've triple-checked with
>> iostat -x... it cannot be any internal device cache as with direct I/O that doesn't
>> happen:
>>>
>>> [root(at)x libaio-ext4]# grep -r -e 'write:' -e 'read :' *
>>> nvme/randread/128/1k/1.txt: read : io=12108MB, bw=206644KB/s,
>>> iops=206643, runt= 60001msec [b]
>>> nvme/randread/128/2k/1.txt: read : io=18821MB, bw=321210KB/s,
>>> iops=160604, runt= 60001msec [b]
>>> nvme/randread/128/4k/1.txt: read : io=36985MB, bw=631208KB/s,
>>> iops=157802, runt= 60001msec [b]
>>> nvme/randread/128/8k/1.txt: read : io=57364MB, bw=976923KB/s,
>>> iops=122115, runt= 60128msec
>>> nvme/randwrite/128/1k/1.txt: write: io=1036.2MB, bw=17683KB/s,
>>> iops=17683, runt= 60001msec [a, as before]
>>> nvme/randwrite/128/2k/1.txt: write: io=2023.2MB, bw=34528KB/s,
>>> iops=17263, runt= 60001msec [a, as before]
>>> nvme/randwrite/128/4k/1.txt: write: io=16667MB, bw=282977KB/s,
>>> iops=70744, runt= 60311msec [reproduced benefit, as per earlier email]
>>> nvme/randwrite/128/8k/1.txt: write: io=22997MB, bw=391839KB/s,
>>> iops=48979, runt= 60099msec
>>>
>>
>> No idea what might be causing this. BTW so you're not using direct-io to access
>> the raw device? Or am I just misreading this?
>
> Both scenarios (raw and fs) have had direct=1 set. I just cannot understand how having direct I/O enabled (which disables caching) achieves better read IOPS on ext4 than on raw device... isn't it contradiction?
>

Thanks for the clarification. Not sure what might be causing this. Did
you use the same parameters (e.g. iodepth) in both cases?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2022-06-07 14:15:30 Re: Add 64-bit XIDs into PostgreSQL 15
Previous Message Jakub Wartak 2022-06-07 13:48:09 RE: pgcon unconference / impact of block size on performance