Re: ANALYZE command progress checker

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: vinayak <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, David Fetter <david(at)fetter(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE command progress checker
Date: 2017-04-04 08:57:54
Message-ID: f4e56064-e969-1735-d257-2218b54763c7@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/04/04 15:30, Masahiko Sawada wrote:
>> We can report progress in terms of individual blocks only inside
>> acquire_sample_rows(), which seems undesirable when one thinks that we
>> will be resetting the target for every child table. We should have a
>> global target that considers all child tables in the inheritance
>> hierarchy, which maybe is possible if we count them beforehand in
>> acquire_inheritance_sample_rows(). But why not use target sample rows,
>> which remains the same for both when we're collecting sample rows from one
>> table and from the whole inheritance hierarchy. We can keep the count of
>> already collected rows in a struct that is used across calls for all the
>> child tables and increment upward from that count when we start collecting
>> from a new child table.
>
> An another option I came up with is that support new pgstat progress
> function, say pgstat_progress_incr_param, which increments index'th
> member in st_progress_param[]. That way we just need to report a delta
> using that function.

That's an interesting idea. It could be made to work and would not
require changing the interface of AcquireSampleRowsFunc, which seems very
desirable.

>>>> /*
>>>> * The first targrows sample rows are simply copied into the
>>>> * reservoir. Then we start replacing tuples in the sample
>>>> * until we reach the end of the relation. This algorithm is
>>>> * from Jeff Vitter's paper (see full citation below). It
>>>> * works by repeatedly computing the number of tuples to skip
>>>> * before selecting a tuple, which replaces a randomly chosen
>>>> * element of the reservoir (current set of tuples). At all
>>>> * times the reservoir is a true random sample of the tuples
>>>> * we've passed over so far, so when we fall off the end of
>>>> * the relation we're done.
>>>> */
>>
>> It seems that we could use samplerows instead of numrows to count the
>> progress (if we choose to count progress in terms of sample rows collected).
>>
>
> I guess it's hard to count progress in terms of sample rows collected
> even if we use samplerows instead, because samplerows can be
> incremented independently of the target number of sampling rows. The
> samplerows can be incremented up to the total number of rows of
> relation.

Hmm, you're right. It could be counted with a separate variable
initialized to 0 and incremented every time we decide to add a row to the
final set of sampled rows, although different implementations of
AcquireSampleRowsFunc have different ways of deciding if a given row will
be part of the final set of sampled rows.

On the other hand, if we decide to count progress in terms of blocks as
you suggested afraid, I'm afraid that FDWs won't be able to report the
progress.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2017-04-04 09:01:36 Re: postgres_fdw bug in 9.6
Previous Message Antonin Houska 2017-04-04 08:41:57 WIP: Aggregation push-down