Re: Add important info about ANALYZE after create Functional Index

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
Cc: Fabrízio Mello <fabriziomello(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add important info about ANALYZE after create Functional Index
Date: 2020-10-28 19:07:52
Message-ID: CAKFQuwZWFTBxFfsdtfy09YzYUznN5c-u=1vWjVkfpE8A=bm0ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 26, 2020 at 9:44 PM Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
wrote:

> On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <
> david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
>> On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
>> wrote:
>>>
>>> Although, this triggers a question – should ANALYZE be automated in,
>>> say, pg_restore as well?
>>>
>>
>> Independent concern.
>>
>
> It's the same class of issues – after we created some objects, we lack
> statistics and willing to automate its collection. If the approach is
> automated in one case, it should be automated in the others, for
> consistency.
>

I don't see a need to force consistency between something that will affect,
at most, one table, and something that will affect an entire database or
cluster. The other material difference is that the previous state of a
restore is "nothing" while in the create/reindex cases we are going from
live, populated, state to another.

I do observe that while the create/reindex analyze would run automatically
during the restore on object creation there would be no data present so it
would be close to a no-op in practice.

>
> And another question: how ANALYZE needs to be run? If it's under the
>>> user's control, there is an option to use vacuumdb --analyze and benefit
>>> from using -j to parallelize the work (and, in some cases, benefit from
>>> using --analyze-in-stages). If we had ANALYZE as a part of building indexes
>>> on expressions, should it be parallelized to the same extent as index
>>> creation (controlled by max_parallel_maintenance_workers)?
>>>
>>
>> None of that seems relevant here. The only relevant parameter I see is
>> what to specify for “table_and_columns”.
>>
>
> I'm not sure I follow.
>

Describe how parallelism within the session that is auto-analyzing is
supposed to work. vaccuumdb opens up multiple connections which shouldn't
happen here.

I suppose having the auto-analyze run three times with different targets
would work but I'm doubting that is a win. I may just be underestimating
how long an analyze on an extremely large table with high statistics takes.

David J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-10-28 19:10:41 Deduplicate aggregates and transition functions in planner
Previous Message Tomas Vondra 2020-10-28 19:07:04 Re: list of extended statistics on psql