Quick Links

Re: many sessions waiting DataFileRead and extend

From:	James Pang <jamespang886(at)gmail(dot)com>
To:	Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com>
Cc:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: many sessions waiting DataFileRead and extend
Date:	2025-06-26 07:22:49
Message-ID:	CAHgTRffvJuBjHjSaSwqKWNYospsfGsoF7D1GxhBudXvaide4sQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

in addition to "DataFileRead", actually we have more session waiting on
"extend", and we enabled log_lock, for example
2025-06-24 18:00:11.368 xxxx:[1865315]:[4-1]:mbsLOG: process 1865315 still
waiting for ExclusiveLock on extension of relation 14658239 of database
16384 after 1000.161 ms
2025-06-24 18:00:11.368 xxxx:mbs(at)pgaaweb:[1865315]:[5-1]:mbsDETAIL:
Process holding the lock: 1748100. Wait queue: 232658, 606235, 1865315,
1939296, 1368901, 2472806, 2072266, 963893, 1304751, 1084094, 875968,
451652, 672314, 1240983, 2085501, 851401, 2043926, 2522611, 889816, 191663,
1090944, 536057, 1810868, 1240510, 1195564, 1067298.

see both "DataFileRead" and "extend" , each time last 2 seconds and
automatically recoverred. for this "extend" wait_event, in addtion IO, OS
page_cache and PG buffer cache, or bgwriter,checkpointer could be impact
that too?

Thanks,

James

James Pang <jamespang886(at)gmail(dot)com> 於 2025年6月26日週四下午2:47寫道：

> we faced this issue 3 times this week, each time last only 2 seconds,
> so not easy to run perf in peak business time to capture that, anyway, I
> will try. before that, I want to understand if "os page cache" or "pg
> buffer cache" can contribute to the wait_event time "extend" and
> "DataFileRead", or bgwriter ,checkpoint flushing data to disk can impact
> that too ? we enable bgwriter , and we see checkpointer get scheduled by
> "wal" during the time, so I just increased max_wal_size to make checkpoint
> scheduled in longer time.
>
> Thanks,
>
> James
>
>
> Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com> 於 2025年6月26日週四下午2:40寫道：
>
>> Okay. So it's a situation that is reproducable.
>> And like was mentioned, the system time (percentage) is very high.
>> Is this a physical machine, or a virtual machine?
>>
>> The next thing to do, is use perf to record about 20 seconds or so during
>> a period of time when you see this behavior (perf record -g, taking the
>> backtrace with it).
>> This records (samples) the backtraces of on cpu tasks, from which you
>> then can derive what they are doing, for which you should see lots of tasks
>> in kernel space, and what that is, using perf report.
>>
>> *Frits Hoogland*
>>
>>
>>
>>
>> On 26 Jun 2025, at 04:32, James Pang <jamespang886(at)gmail(dot)com> wrote:
>>
>> thans for you suggestions, we have iowait from sar command too, copy
>> here, checking with infra team not found abnormal IO activities either.
>> 02:00:01 PM CPU %usr %nice %sys %iowait %irq %soft %steal
>> %guest %gnice %idle
>> 02:00:03 PM all 15.92 0.00 43.02 0.65 0.76 2.56
>> 0.00 0.00 0.00 37.09
>> 02:00:03 PM 0 17.59 0.00 46.73 1.01 0.50 0.50
>> 0.00 0.00 0.00 33.67
>> 02:00:03 PM 1 9.50 0.00 61.50 0.50 0.50 1.00
>> 0.00 0.00 0.00 27.00
>> 02:00:03 PM 2 20.71 0.00 44.44 1.01 0.51 0.51
>> 0.00 0.00 0.00 32.83
>> 02:00:03 PM 3 14.00 0.00 51.50 2.00 1.00 1.00
>> 0.00 0.00 0.00 30.50
>> 02:00:03 PM 4 6.57 0.00 52.53 0.51 0.51 3.54
>> 0.00 0.00 0.00 36.36
>> 02:00:03 PM 5 10.20 0.00 49.49 1.02 1.53 0.00
>> 0.00 0.00 0.00 37.76
>> 02:00:03 PM 6 27.64 0.00 41.21 0.50 0.50 0.50
>> 0.00 0.00 0.00 29.65
>> 02:00:03 PM 7 9.05 0.00 50.75 0.50 1.01 0.50
>> 0.00 0.00 0.00 38.19
>> 02:00:03 PM 8 12.18 0.00 49.75 0.51 0.51 0.51
>> 0.00 0.00 0.00 36.55
>> 02:00:03 PM 9 13.00 0.00 9.50 0.50 1.50 15.50
>> 0.00 0.00 0.00 60.00
>> 02:00:03 PM 10 15.58 0.00 46.23 0.00 0.50 0.50
>> 0.00 0.00 0.00 37.19
>> 02:00:03 PM 11 20.71 0.00 10.10 0.00 1.01 14.14
>> 0.00 0.00 0.00 54.04
>> 02:00:03 PM 12 21.00 0.00 37.00 0.50 1.00 1.00
>> 0.00 0.00 0.00 39.50
>> 02:00:03 PM 13 13.57 0.00 45.73 1.01 1.01 1.01
>> 0.00 0.00 0.00 37.69
>> 02:00:03 PM 14 18.18 0.00 39.39 1.01 0.51 0.51 0.00
>> 0.00 0.00 40.40
>> 02:00:03 PM 15 14.00 0.00 49.50 0.50 0.50 3.50 0.00
>> 0.00 0.00 32.00
>> 02:00:03 PM 16 19.39 0.00 39.80 1.02 1.53 0.51 0.00
>> 0.00 0.00 37.76
>> 02:00:03 PM 17 16.75 0.00 45.18 1.52 1.02 2.54 0.00
>> 0.00 0.00 32.99
>> 02:00:03 PM 18 12.63 0.00 50.00 0.00 1.01 0.00 0.00
>> 0.00 0.00 36.36
>> 02:00:03 PM 19 5.56 0.00 82.32 0.00 0.00 0.00 0.00
>> 0.00 0.00 12.12
>> 02:00:03 PM 20 15.08 0.00 48.24 0.50 0.50 3.52 0.00
>> 0.00 0.00 32.16
>> 02:00:03 PM 21 17.68 0.00 9.09 0.51 1.52 13.64 0.00
>> 0.00 0.00 57.58
>> 02:00:03 PM 22 13.13 0.00 43.94 0.51 0.51 0.51 0.00
>> 0.00 0.00 41.41
>> 02:00:03 PM 23 14.07 0.00 42.71 0.50 0.50 0.50 0.00
>> 0.00 0.00 41.71
>> 02:00:03 PM 24 13.13 0.00 41.92 1.01 0.51 0.51 0.00
>> 0.00 0.00 42.93
>> 02:00:03 PM 25 16.58 0.00 47.74 0.50 1.01 0.50 0.00
>> 0.00 0.00 33.67
>> 02:00:03 PM 26 16.58 0.00 46.73 0.50 1.01 0.50 0.00
>> 0.00 0.00 34.67
>> 02:00:03 PM 27 45.50 0.00 54.50 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00
>> 02:00:03 PM 28 6.06 0.00 32.32 0.00 0.51 13.13 0.00
>> 0.00 0.00 47.98
>> 02:00:03 PM 29 13.93 0.00 44.78 1.00 1.00 0.50 0.00
>> 0.00 0.00 38.81
>> 02:00:03 PM 30 11.56 0.00 57.79 0.00 0.50 1.01 0.00
>> 0.00 0.00 29.15
>> 02:00:03 PM 31 33.85 0.00 9.23 0.51 1.54 0.51 0.00
>> 0.00 0.00 54.36
>> 02:00:03 PM 32 30.15 0.00 41.71 0.50 0.50 1.51 0.00
>> 0.00 0.00 25.63
>>
>> Thanks,
>>
>> James
>>
>> Frits Hoogland <frits(dot)hoogland(at)gmail(dot)com> 於 2025年6月25日週三下午10:27寫道：
>>
>>>
>>>
>>> > On 25 Jun 2025, at 07:59, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
>>> wrote:
>>> >
>>> > On Wed, 2025-06-25 at 11:15 +0800, James Pang wrote:
>>> >> pgv14, RHEL8, xfs , we suddenly see tens of sessions waiting on
>>> "DataFileRead" and
>>> >> "extend", it last about 2 seconds(based on pg_stat_activity query) ,
>>> during the
>>> >> waiting time, "%sys" cpu increased to 80% , but from "iostat" , no
>>> high iops and
>>> >> io read/write latency increased either.
>>> >
>>> > Run "sar -P all 1" and see if "%iowait" is high.
>>> I would (strongly) advise against the use of iowait as an indicator. It
>>> is a kernel approximation of time spent in IO from which cannot be use used
>>> in any sensible way other than possibly you're doing IO.
>>> First of all, iowait is not a kernel state, and therefore it's taken
>>> from idle. This means that if there is no, or too little, idle time, iowait
>>> that should be there is gone.
>>> Second, the calculation to transfer idle time to iowait is done for
>>> synchronous IO calls only. Which currently is not a problem for postgres
>>> because it uses exactly that, but in the future it might.
>>> Very roughly put, what the kernel does is keep a counter of tasks
>>> currently in certain system IO calls, and then try to express that using
>>> iowait. The time in IO wait can't be used calculate any IO facts.
>>>
>>> In that sense, it puts it in the same area as the load figure:
>>> indicative, but mostly useless because it doesn't give you any facts about
>>> what it is expressing.
>>> >
>>> > Check if you have transparent hugepages enabled:
>>> >
>>> > cat /sys/kernel/mm/transparent_hugepage/enabled
>>> >
>>> > If they are enabled, disable them and see if it makes a difference.
>>> >
>>> > I am only guessing here.
>>> Absolutely. Anything that is using signficant amounts of memory and is
>>> not created to take advantage of transparent hugepages will probably
>>> experience more downsides from THP than it helps.
>>> >
>>> >> many sessions were running same "DELETE FROM xxxx" in parallel
>>> waiting on "extend"
>>> >> and "DataFileRead", there are triggers in this table "After delete"
>>> to insert/delete
>>> >> other tables in the tigger.
>>> >
>>> > One thing that almost certainly would improve your situation is to run
>>> fewer
>>> > concurrent statements, for example by using a reasonably sized
>>> connection pool.
>>> This is true if the limits of the IO device, or anything towards to IO
>>> device or devices are hit.
>>> And in general, high "%sys", alias lots of time spent in kernel mode
>>> alias system time indicates lots of time spent in system calls, which is
>>> what the read and write calls in postgres are.
>>> Therefore these figures suggest blocking for IO, for which Laurenz'
>>> advise to lower the amount of concurrent sessions doing IO in general makes
>>> sense.
>>> A more nuanced analysis: if IO requests get queued, these will wait in
>>> 'D' state in linux, which by definition is off cpu, and thus do not spent
>>> cpu (system/kernel) time.
>>>
>>> What sounds suspicious is that you indicate you indicate there is you
>>> see no signficant change in the amount of IO in iostat.
>>>
>>> In order to understand this, you will have to first carefully find the
>>> actual IO physical IO devices that you are using for postgres IO.
>>> In current linux this can be tricky, depending on how the hardware or
>>> virtual machine looks like, and how the disks are arranged in linux.
>>> What you need to determine is which actual disk devices are used, and
>>> what their limits are.
>>> Limits for any disk are IOPS (operations per second) and MBPS (megabytes
>>> per second -> bandwdith).
>>>
>>> There is an additional thing to realize, which makes this really tricky:
>>> postgres for common IO uses buffered IO.
>>> Buffered IO means any read or write will use the linux buffercache, and
>>> read or writes can be served from the buffercache if possible.
>>>
>>> So in your case, if you managed to make the database perform identical
>>> read or write requests, this could result in a difference of amounts of
>>> read and write IOs served from the cache, which can make an enormous
>>> amounts of difference for how fast these requests are served. If somehow
>>> you managed to make the operating system choose to use the physical IO
>>> path, you will see significant amounts time spent on that, which will have
>>> IO related wait events.
>>>
>>> Not a simple answer, but this is how it works.
>>>
>>> So I would suggest checking the difference between the situation of when
>>> it's doing the same which is considered well performing versus badly
>>> performing.
>>>
>>>
>>> >
>>> > Yours,
>>> > Laurenz Albe
>>> >
>>> >
>>>
>>>
>>

In response to

Re: many sessions waiting DataFileRead and extend at 2025-06-26 06:47:56 from James Pang

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Frits Hoogland	2025-06-26 08:07:18	Re: many sessions waiting DataFileRead and extend
Previous Message	James Pang	2025-06-26 06:47:56	Re: many sessions waiting DataFileRead and extend