Re: automatic analyze: readahead - add "IO read time" log message

From: Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: automatic analyze: readahead - add "IO read time" log message
Date: 2020-10-27 15:06:05
Message-ID: VI1PR0701MB69603772FDF522FE0B4BFD80F6160@VI1PR0701MB6960.eurprd07.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Stephen, hackers,

> The analyze is doing more-or-less random i/o since it's skipping through
> the table picking out select blocks, not doing regular sequential i/o.
VS
>> Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, blockno=19890910, bstrategy=0x1102278) at heapam_handler.c:984
>> Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, blockno=19890912, bstrategy=0x1102278) at heapam_handler.c:984
>> Breakpoint 1, heapam_scan_analyze_next_block (scan=0x10c8098, blockno=19890922, bstrategy=0x1102278) at heapam_handler.c:984
>Not really sure what's interesting here, but it does look like we're
>skipping through the table as expected.

Yes, but not randomly in this case. I wanted to point out that this is incrementing block number, therefore I've included this debug output which might trigger readahead heuristics.
Perhaps this depends on how the table was built / vacuumed ? (in this case, pure-INSERT-only; I would expect the same in time series DBs and DWHs).

> With all those 'readahead' calls it certainly makes one wonder if the
> Linux kernel is reading more than just the block we're looking for
> because it thinks we're doing a sequential read and will therefore want
> the next few blocks when, in reality, we're going to skip past them,
> meaning that any readahead the kernel is doing is likely just wasted
> I/O.

I've done some quick&dirty tests with blockdev --setra/setfra 0 after spending time looking at the smgr/md/fd API changes required to find shortcut, but I'm getting actually a little bit worse timings at least on "laptop DB tests". One thing that I've noticed is that needs to be only for automatic-analyze, but not for automatic-vacuum where apparently there is some boost due to readahead.

> That definitely seems like a useful thing to include and thanks for the
> patch!  Please be sure to register it in the commitfest app:
> https://commitfest.postgresql.org

Thank you! Thread is now registered.

> I would think that, ideally, we'd teach analyze.c to work in the same
> way that bitmap heap scans do- that is, use posix_fadvise to let the
> kernel know what pages we're going to want next instead of the kernel
> guessing (incorrectly) or not doing any pre-fetching.  I didn't spend a
> lot of time poking, but it doesn't look like analyze.c tries to do any
> prefetching today.  In a similar vein, I wonder if VACUUM should be
> doing prefetching too today, at least when it's skipping through the
> heap based on the visibility map and jumping over all-frozen pages.

My only idea would be that a lot of those blocks could be read asynchronously in batches (AIO) with POSIX_FADV_RANDOM issued on block-range before, so maybe the the optimization is possible, but not until we'll have AIO ;)

> Haven't looked too closely at this but in general +1 on the idea and
> this approach looks pretty reasonable to me.  Only thing I can think of
> off-hand is to check how it compares to other places where we report IO
> read time and make sure that it looks similar.

Ok, I've changed the output in 002 version to include "avg read rate" just like in the autovacuum case but still maintaining single line output, e.g:
automatic analyze of table "test.public.t1_default" avg read rate: 96.917 MB/s (read time: 2.52 s), system usage: CPU: user: 0.28 s, system: 0.26 s, elapsed: 2.94 s

-J.

Attachment Content-Type Size
0002-do_analyze_rel_ioreadtimings_patch.c text/plain 2.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2020-10-27 15:22:07 Re: Parallel copy
Previous Message Dilip Kumar 2020-10-27 14:23:31 Re: parallel distinct union and aggregate support patch