Re: automatic analyze: readahead - add "IO read time" log message

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: automatic analyze: readahead - add "IO read time" log message
Date: 2021-03-12 00:29:46
Message-ID: c8d21a1b-63bb-1bb5-c6e8-b69f5d2fa7e5@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/12/21 1:11 AM, Stephen Frost wrote:
> Greetings,
>
> * Tomas Vondra (tomas(dot)vondra(at)enterprisedb(dot)com) wrote:
>> On 3/8/21 8:42 PM, Stephen Frost wrote:
>>> * Tomas Vondra (tomas(dot)vondra(at)enterprisedb(dot)com) wrote:
>>>> On 2/10/21 11:10 PM, Stephen Frost wrote:
>>>>> * Heikki Linnakangas (hlinnaka(at)iki(dot)fi) wrote:
>>>>>> On 05/02/2021 23:22, Stephen Frost wrote:
>>>>>>> Unless there's anything else on this, I'll commit these sometime next
>>>>>>> week.
>>>>>>
>>>>>> One more thing: Instead of using 'effective_io_concurrency' GUC directly,
>>>>>> should call get_tablespace_maintenance_io_concurrency().
>>>>>
>>>>> Ah, yeah, of course.
>>>>>
>>>>> Updated patch attached.
>>>>
>>>> A couple minor comments:
>>>>
>>>> 1) I think the patch should be split into two parts, one adding the
>>>> track_io_timing, one adding the prefetching.
>>>
>>> This was already done..
>>
>> Not sure what you mean by "done"? I see the patch still does both
>> changes related to track_io_timing and prefetching.
>
> They are two patches..
>
> ➜ ~ grep Subject analyze_prefetch_v8.patch
> Subject: [PATCH 1/2] Improve logging of auto-vacuum and auto-analyze
> Subject: [PATCH 2/2] Use pre-fetching for ANALYZE
>
> The first doesn't have any prefetch-related things, the second doesn't
> have any track_io_timing things..
>

Oh! I didn't realize a single file can contain multiple separate
patches, so I saw a single file and assumed it's one patch. How do you
produce a single file, and is that better than just generating multiple
files using git format-patch?

>>>> 3) Is there a way to reduce the amount of #ifdef in acquire_sample_rows?
>>>
>>> Perhaps..
>>>
>>>> This makes the code rather hard to read, IMHO. It seems to me we can
>>>> move the code around a bit and merge some of the #ifdef blocks - see the
>>>> attached patch. Most of this is fairly trivial, with the exception of
>>>> moving PrefetchBuffer before table_scan_analyze_next_block - AFAIK this
>>>> does not materially change the behavior, but perhaps I'm wrong.
>>>
>>> but I don't particularly like doing the prefetch right before we run
>>> vacuum_delay_point() and potentially sleep.
>>
>> Why? Is that just a matter of personal preference (fair enough) or is
>> there a reason why that would be wrong/harmful?
>
> Telling the kernel "hey, we're about to need this, please go get it" and
> then immediately going to sleep just strikes me as a bit off. We should
> be trying to minimize the time between prefetch and actual request for
> the page. Of course, there's going to be some times that we will issue
> a prefetch and then come around again and end up hitting the limit and
> sleeping before we actually request the page and maybe it doesn't
> ultimately matter much but it just seems better to sleep first before
> issuing the prefetch to minimize the amount 'outstanding' when we do end
> up sleeping. One thing about prefetching is that the kernel is
> certainly within its rights to decide to drop the page before we
> actually go to read it, if it's under pressure and we're just sleeping.
>

I don't know, but this argument seems rather hand-wavy to me, TBH.

Firstly, we're prefetching quite far ahead anyway, so there's almost
always going to be a vacuum_delay_point() call between you issue the
prefetch and actually using the page. Secondly, vacuum_delay_point won't
really sleep for most of the calls (assuming the cost limit is not
unreasonably small).

The way I see it (a) it's unlikely a particular vacuum_delay_point()
call will sleep, while at the same time (b) it's quite probable one of
the calls between prefetching and using the page will sleep.

So I think the exact ordering of those two calls does not really matter,
at least not for this particular reason.

That being said, I don't feel strongly about this - if you prefer it
like this, so be it. I can live with two ifdefs instead of one.

>> I think e.g. prefetch_targblock could be moved to the next #ifdef, which
>> will eliminate the one-line ifdef.
>
> Sure, done in the attached.
>
> Thanks for the review! Unless there's other comments, I'll plan to push
> this over the weekend or early next week.
>

+1

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-03-12 00:31:43 Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Previous Message Kyotaro Horiguchi 2021-03-12 00:23:12 Re: shared-memory based stats collector