Re: automatic analyze: readahead - add "IO read time" log message

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: automatic analyze: readahead - add "IO read time" log message
Date: 2021-03-12 00:41:04
Message-ID: 20210312004104.GU20766@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tomas Vondra (tomas(dot)vondra(at)enterprisedb(dot)com) wrote:
> On 3/12/21 1:11 AM, Stephen Frost wrote:
> > * Tomas Vondra (tomas(dot)vondra(at)enterprisedb(dot)com) wrote:
> >> On 3/8/21 8:42 PM, Stephen Frost wrote:
> >>> * Tomas Vondra (tomas(dot)vondra(at)enterprisedb(dot)com) wrote:
> >>>> On 2/10/21 11:10 PM, Stephen Frost wrote:
> >>>>> * Heikki Linnakangas (hlinnaka(at)iki(dot)fi) wrote:
> >>>>>> On 05/02/2021 23:22, Stephen Frost wrote:
> >>>>>>> Unless there's anything else on this, I'll commit these sometime next
> >>>>>>> week.
> >>>>>>
> >>>>>> One more thing: Instead of using 'effective_io_concurrency' GUC directly,
> >>>>>> should call get_tablespace_maintenance_io_concurrency().
> >>>>>
> >>>>> Ah, yeah, of course.
> >>>>>
> >>>>> Updated patch attached.
> >>>>
> >>>> A couple minor comments:
> >>>>
> >>>> 1) I think the patch should be split into two parts, one adding the
> >>>> track_io_timing, one adding the prefetching.
> >>>
> >>> This was already done..
> >>
> >> Not sure what you mean by "done"? I see the patch still does both
> >> changes related to track_io_timing and prefetching.
> >
> > They are two patches..
> >
> > ➜ ~ grep Subject analyze_prefetch_v8.patch
> > Subject: [PATCH 1/2] Improve logging of auto-vacuum and auto-analyze
> > Subject: [PATCH 2/2] Use pre-fetching for ANALYZE
> >
> > The first doesn't have any prefetch-related things, the second doesn't
> > have any track_io_timing things..
>
> Oh! I didn't realize a single file can contain multiple separate
> patches, so I saw a single file and assumed it's one patch. How do you
> produce a single file, and is that better than just generating multiple
> files using git format-patch?

I use my gfp alias, which sends to stdout:

gfp='git format-patch @{u} --stdout'

So I only have one file to move around and there's no question about the
ordering of applying them and such.. Not sure if everyone would agree
that it's better or worse but I do find it works a bit better for me
than dealing with multiple files.

> >>>> 3) Is there a way to reduce the amount of #ifdef in acquire_sample_rows?
> >>>
> >>> Perhaps..
> >>>
> >>>> This makes the code rather hard to read, IMHO. It seems to me we can
> >>>> move the code around a bit and merge some of the #ifdef blocks - see the
> >>>> attached patch. Most of this is fairly trivial, with the exception of
> >>>> moving PrefetchBuffer before table_scan_analyze_next_block - AFAIK this
> >>>> does not materially change the behavior, but perhaps I'm wrong.
> >>>
> >>> but I don't particularly like doing the prefetch right before we run
> >>> vacuum_delay_point() and potentially sleep.
> >>
> >> Why? Is that just a matter of personal preference (fair enough) or is
> >> there a reason why that would be wrong/harmful?
> >
> > Telling the kernel "hey, we're about to need this, please go get it" and
> > then immediately going to sleep just strikes me as a bit off. We should
> > be trying to minimize the time between prefetch and actual request for
> > the page. Of course, there's going to be some times that we will issue
> > a prefetch and then come around again and end up hitting the limit and
> > sleeping before we actually request the page and maybe it doesn't
> > ultimately matter much but it just seems better to sleep first before
> > issuing the prefetch to minimize the amount 'outstanding' when we do end
> > up sleeping. One thing about prefetching is that the kernel is
> > certainly within its rights to decide to drop the page before we
> > actually go to read it, if it's under pressure and we're just sleeping.
>
> I don't know, but this argument seems rather hand-wavy to me, TBH.

I agree it is, but then again, the only argument raised against doing it
this way is that it might reduce the number of #ifdef's and, at least
for my part, that doesn't seem like a terribly strong argument in this
case.

> Firstly, we're prefetching quite far ahead anyway, so there's almost
> always going to be a vacuum_delay_point() call between you issue the
> prefetch and actually using the page. Secondly, vacuum_delay_point won't
> really sleep for most of the calls (assuming the cost limit is not
> unreasonably small).
>
> The way I see it (a) it's unlikely a particular vacuum_delay_point()
> call will sleep, while at the same time (b) it's quite probable one of
> the calls between prefetching and using the page will sleep.

Doing the prefetch before the sleep sets us up in a position where
there'll always be one more page that's been prefetched while we're
sleeping, no matter when that sleep actually happens, than if we do the
prefetch after the sleep. Considering the default's only 10, that's 10%
of the pages prefetched.

> So I think the exact ordering of those two calls does not really matter,
> at least not for this particular reason.

I don't see it as a huge deal, but..

> That being said, I don't feel strongly about this - if you prefer it
> like this, so be it. I can live with two ifdefs instead of one.

Yeah, I do think I prefer it.

> >> I think e.g. prefetch_targblock could be moved to the next #ifdef, which
> >> will eliminate the one-line ifdef.
> >
> > Sure, done in the attached.
> >
> > Thanks for the review! Unless there's other comments, I'll plan to push
> > this over the weekend or early next week.
>
> +1

Thanks again for the review!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-12 00:42:30 Re: [PATCH] Covering SPGiST index
Previous Message David Rowley 2021-03-12 00:38:49 Re: Hybrid Hash/Nested Loop joins and caching results from subplans