Re: many sessions waiting DataFileRead and extend

From: Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, James Pang <jamespang886(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: many sessions waiting DataFileRead and extend
Date: 2025-06-25 14:27:29
Message-ID: F9A0F6D3-9365-4A32-BE14-C996E08C3A0D@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> On 25 Jun 2025, at 07:59, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
>
> On Wed, 2025-06-25 at 11:15 +0800, James Pang wrote:
>> pgv14, RHEL8, xfs , we suddenly see tens of sessions waiting on "DataFileRead" and
>> "extend", it last about 2 seconds(based on pg_stat_activity query) , during the
>> waiting time, "%sys" cpu increased to 80% , but from "iostat" , no high iops and
>> io read/write latency increased either.
>
> Run "sar -P all 1" and see if "%iowait" is high.
I would (strongly) advise against the use of iowait as an indicator. It is a kernel approximation of time spent in IO from which cannot be use used in any sensible way other than possibly you're doing IO.
First of all, iowait is not a kernel state, and therefore it's taken from idle. This means that if there is no, or too little, idle time, iowait that should be there is gone.
Second, the calculation to transfer idle time to iowait is done for synchronous IO calls only. Which currently is not a problem for postgres because it uses exactly that, but in the future it might.
Very roughly put, what the kernel does is keep a counter of tasks currently in certain system IO calls, and then try to express that using iowait. The time in IO wait can't be used calculate any IO facts.

In that sense, it puts it in the same area as the load figure: indicative, but mostly useless because it doesn't give you any facts about what it is expressing.
>
> Check if you have transparent hugepages enabled:
>
> cat /sys/kernel/mm/transparent_hugepage/enabled
>
> If they are enabled, disable them and see if it makes a difference.
>
> I am only guessing here.
Absolutely. Anything that is using signficant amounts of memory and is not created to take advantage of transparent hugepages will probably experience more downsides from THP than it helps.
>
>> many sessions were running same "DELETE FROM xxxx" in parallel waiting on "extend"
>> and "DataFileRead", there are triggers in this table "After delete" to insert/delete
>> other tables in the tigger.
>
> One thing that almost certainly would improve your situation is to run fewer
> concurrent statements, for example by using a reasonably sized connection pool.
This is true if the limits of the IO device, or anything towards to IO device or devices are hit.
And in general, high "%sys", alias lots of time spent in kernel mode alias system time indicates lots of time spent in system calls, which is what the read and write calls in postgres are.
Therefore these figures suggest blocking for IO, for which Laurenz' advise to lower the amount of concurrent sessions doing IO in general makes sense.
A more nuanced analysis: if IO requests get queued, these will wait in 'D' state in linux, which by definition is off cpu, and thus do not spent cpu (system/kernel) time.

What sounds suspicious is that you indicate you indicate there is you see no signficant change in the amount of IO in iostat.

In order to understand this, you will have to first carefully find the actual IO physical IO devices that you are using for postgres IO.
In current linux this can be tricky, depending on how the hardware or virtual machine looks like, and how the disks are arranged in linux.
What you need to determine is which actual disk devices are used, and what their limits are.
Limits for any disk are IOPS (operations per second) and MBPS (megabytes per second -> bandwdith).

There is an additional thing to realize, which makes this really tricky: postgres for common IO uses buffered IO.
Buffered IO means any read or write will use the linux buffercache, and read or writes can be served from the buffercache if possible.

So in your case, if you managed to make the database perform identical read or write requests, this could result in a difference of amounts of read and write IOs served from the cache, which can make an enormous amounts of difference for how fast these requests are served. If somehow you managed to make the operating system choose to use the physical IO path, you will see significant amounts time spent on that, which will have IO related wait events.

Not a simple answer, but this is how it works.

So I would suggest checking the difference between the situation of when it's doing the same which is considered well performing versus badly performing.

>
> Yours,
> Laurenz Albe
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2025-06-25 17:33:32 Re: proposal: schema variables
Previous Message James Pang 2025-06-25 09:10:55 Re: many sessions waiting DataFileRead and extend