Re: autovac issue with large number of tables

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kasahara Tatsuhito <kasahara(dot)tatsuhito(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <nasbyj(at)amazon(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovac issue with large number of tables
Date: 2020-11-30 11:59:03
Message-ID: 4a466c0f-b5f0-a702-86e7-fceb07a3d07a@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/11/30 10:43, Masahiko Sawada wrote:
> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
> <kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:
>>
>> Hi, Thanks for you comments.
>>
>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>
>>>
>>>
>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote:
>>>> Hi,
>>>>
>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>>>>
>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
>>>>>>> <kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
>>>>>>>>> <kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
>>>>>>>>>> <kasahara(dot)tatsuhito(at)gmail(dot)com> wrote:
>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two probes of the stats
>>>>>>>>>>>> data. First probe the existing stats data, and if it shows the table to
>>>>>>>>>>>> be already vacuumed, return immediately. If not, *then* force a stats
>>>>>>>>>>>> re-read, and check a second time.
>>>>>>>>>>> Does the above mean that the second and subsequent table_recheck_autovac()
>>>>>>>>>>> will be improved to first check using the previous refreshed statistics?
>>>>>>>>>>> I think that certainly works.
>>>>>>>>>>>
>>>>>>>>>>> If that's correct, I'll try to create a patch for the PoC
>>>>>>>>>>
>>>>>>>>>> I still don't know how to reproduce Jim's troubles, but I was able to reproduce
>>>>>>>>>> what was probably a very similar problem.
>>>>>>>>>>
>>>>>>>>>> This problem seems to be more likely to occur in cases where you have
>>>>>>>>>> a large number of tables,
>>>>>>>>>> i.e., a large amount of stats, and many small tables need VACUUM at
>>>>>>>>>> the same time.
>>>>>>>>>>
>>>>>>>>>> So I followed Tom's advice and created a patch for the PoC.
>>>>>>>>>> This patch will enable a flag in the table_recheck_autovac function to use
>>>>>>>>>> the existing stats next time if VACUUM (or ANALYZE) has already been done
>>>>>>>>>> by another worker on the check after the stats have been updated.
>>>>>>>>>> If the tables continue to require VACUUM after the refresh, then a refresh
>>>>>>>>>> will be required instead of using the existing statistics.
>>>>>>>>>>
>>>>>>>>>> I did simple test with HEAD and HEAD + this PoC patch.
>>>>>>>>>> The tests were conducted in two cases.
>>>>>>>>>> (I changed few configurations. see attached scripts)
>>>>>>>>>>
>>>>>>>>>> 1. Normal VACUUM case
>>>>>>>>>> - SET autovacuum = off
>>>>>>>>>> - CREATE tables with 100 rows
>>>>>>>>>> - DELETE 90 rows for each tables
>>>>>>>>>> - SET autovacuum = on and restart PostgreSQL
>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed
>>>>>>>>>>
>>>>>>>>>> 2. Anti wrap round VACUUM case
>>>>>>>>>> - CREATE brank tables
>>>>>>>>>> - SELECT all of these tables (for generate stats)
>>>>>>>>>> - SET autovacuum_freeze_max_age to low values and restart PostgreSQL
>>>>>>>>>> - Consumes a lot of XIDs by using txid_curent()
>>>>>>>>>> - Measure the time it takes for all tables to be VACUUMed
>>>>>>>>>>
>>>>>>>>>> For each test case, the following results were obtained by changing
>>>>>>>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
>>>>>>>>>> Also changing num of tables to 1000, 5000, 10000 and 20000.
>>>>>>>>>>
>>>>>>>>>> Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
>>>>>>>>>> but I think it's enough to ask for a trend.
>>>>>>>>>>
>>>>>>>>>> ===========================================================================
>>>>>>>>>> [1.Normal VACUUM case]
>>>>>>>>>> tables:1000
>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 20 sec VS (with patch) 20 sec
>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 18 sec VS (with patch) 16 sec
>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 18 sec VS (with patch) 16 sec
>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 19 sec VS (with patch) 17 sec
>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 19 sec VS (with patch) 17 sec
>>>>>>>>>>
>>>>>>>>>> tables:5000
>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 77 sec VS (with patch) 78 sec
>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 61 sec VS (with patch) 43 sec
>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 38 sec VS (with patch) 38 sec
>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 45 sec VS (with patch) 37 sec
>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 43 sec VS (with patch) 35 sec
>>>>>>>>>>
>>>>>>>>>> tables:10000
>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 152 sec VS (with patch) 153 sec
>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 119 sec VS (with patch) 98 sec
>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 87 sec VS (with patch) 78 sec
>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 100 sec VS (with patch) 66 sec
>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 97 sec VS (with patch) 56 sec
>>>>>>>>>>
>>>>>>>>>> tables:20000
>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 338 sec VS (with patch) 339 sec
>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 231 sec VS (with patch) 229 sec
>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 220 sec VS (with patch) 191 sec
>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 234 sec VS (with patch) 147 sec
>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 320 sec VS (with patch) 113 sec
>>>>>>>>>>
>>>>>>>>>> [2.Anti wrap round VACUUM case]
>>>>>>>>>> tables:1000
>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 19 sec VS (with patch) 18 sec
>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 14 sec VS (with patch) 15 sec
>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 14 sec VS (with patch) 14 sec
>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 14 sec VS (with patch) 16 sec
>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 16 sec VS (with patch) 14 sec
>>>>>>>>>>
>>>>>>>>>> tables:5000
>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 69 sec VS (with patch) 69 sec
>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 66 sec VS (with patch) 47 sec
>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 59 sec VS (with patch) 37 sec
>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 39 sec VS (with patch) 28 sec
>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 39 sec VS (with patch) 29 sec
>>>>>>>>>>
>>>>>>>>>> tables:10000
>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 139 sec VS (with patch) 138 sec
>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 130 sec VS (with patch) 86 sec
>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 120 sec VS (with patch) 68 sec
>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 96 sec VS (with patch) 41 sec
>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 90 sec VS (with patch) 39 sec
>>>>>>>>>>
>>>>>>>>>> tables:20000
>>>>>>>>>> autovacuum_max_workers 1: (HEAD) 313 sec VS (with patch) 331 sec
>>>>>>>>>> autovacuum_max_workers 2: (HEAD) 209 sec VS (with patch) 201 sec
>>>>>>>>>> autovacuum_max_workers 3: (HEAD) 227 sec VS (with patch) 141 sec
>>>>>>>>>> autovacuum_max_workers 5: (HEAD) 236 sec VS (with patch) 88 sec
>>>>>>>>>> autovacuum_max_workers 10: (HEAD) 309 sec VS (with patch) 74 sec
>>>>>>>>>> ===========================================================================
>>>>>>>>>>
>>>>>>>>>> The cases without patch, the scalability of the worker has decreased
>>>>>>>>>> as the number of tables has increased.
>>>>>>>>>> In fact, the more workers there are, the longer it takes to complete
>>>>>>>>>> VACUUM to all tables.
>>>>>>>>>> The cases with patch, it shows good scalability with respect to the
>>>>>>>>>> number of workers.
>>>>>>>>>
>>>>>>>>> It seems a good performance improvement even without the patch of
>>>>>>>>> shared memory based stats collector.
>>>>>
>>>>> Sounds great!
>>>>>
>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Note that perf top results showed that hash_search_with_hash_value,
>>>>>>>>>> hash_seq_search and
>>>>>>>>>> pgstat_read_statsfiles are dominant during VACUUM in all patterns,
>>>>>>>>>> with or without the patch.
>>>>>>>>>>
>>>>>>>>>> Therefore, there is still a need to find ways to optimize the reading
>>>>>>>>>> of large amounts of stats.
>>>>>>>>>> However, this patch is effective in its own right, and since there are
>>>>>>>>>> only a few parts to modify,
>>>>>>>>>> I think it should be able to be applied to current (preferably
>>>>>>>>>> pre-v13) PostgreSQL.
>>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> +
>>>>>>>>> + /* We might be better to refresh stats */
>>>>>>>>> + use_existing_stats = false;
>>>>>>>>> }
>>>>>>>>> + else
>>>>>>>>> + {
>>>>>>>>>
>>>>>>>>> - heap_freetuple(classTup);
>>>>>>>>> + heap_freetuple(classTup);
>>>>>>>>> + /* The relid has already vacuumed, so we might be better to
>>>>>>>>> use exiting stats */
>>>>>>>>> + use_existing_stats = true;
>>>>>>>>> + }
>>>>>>>>>
>>>>>>>>> With that patch, the autovacuum process refreshes the stats in the
>>>>>>>>> next check if it finds out that the table still needs to be vacuumed.
>>>>>>>>> But I guess it's not necessarily true because the next table might be
>>>>>>>>> vacuumed already. So I think we might want to always use the existing
>>>>>>>>> for the first check. What do you think?
>>>>>>>> Thanks for your comment.
>>>>>>>>
>>>>>>>> If we assume the case where some workers vacuum on large tables
>>>>>>>> and a single worker vacuum on small tables, the processing
>>>>>>>> performance of the single worker will be slightly lower if the
>>>>>>>> existing statistics are checked every time.
>>>>>>>>
>>>>>>>> In fact, at first I tried to check the existing stats every time,
>>>>>>>> but the performance was slightly worse in cases with a small number of workers.
>>>>>
>>>>> Do you have this benchmark result?
>>>>>
>>>>>
>>>>>>>> (Checking the existing stats is lightweight , but at high frequency,
>>>>>>>> it affects processing performance.)
>>>>>>>> Therefore, at after refresh statistics, determine whether autovac
>>>>>>>> should use the existing statistics.
>>>>>>>
>>>>>>> Yeah, since the test you used uses a lot of small tables, if there are
>>>>>>> a few workers, checking the existing stats is unlikely to return true
>>>>>>> (no need to vacuum). So the cost of existing stats check ends up being
>>>>>>> overhead. Not sure how slow always checking the existing stats was,
>>>>>>> but given that the shared memory based stats collector patch could
>>>>>>> improve the performance of refreshing stats, it might be better not to
>>>>>>> check the existing stats frequently like the patch does. Anyway, I
>>>>>>> think it’s better to evaluate the performance improvement with other
>>>>>>> cases too.
>>>>>> Yeah, I would like to see how much the performance changes in other cases.
>>>>>> In addition, if the shared-based-stats patch is applied, we won't need to reload
>>>>>> a huge stats file, so we will just have to check the stats on
>>>>>> shared-mem every time.
>>>>>> Perhaps the logic of table_recheck_autovac could be simpler.
>>>>>>
>>>>>>>> BTW, I found some typos in comments, so attache a fixed version.
>>>>>
>>>>> The patch adds some duplicated codes into table_recheck_autovac().
>>>>> It's better to make the common function performing them and make
>>>>> table_recheck_autovac() call that common function, to simplify the code.
>>>> Thanks for your comment.
>>>> Hmm.. I've cut out the duplicate part.
>>>> Attach the patch.
>>>> Could you confirm that it fits your expecting?
>>>
>>> Yes, thanks for updataing the patch! Here are another review comments.
>>>
>>> + shared = pgstat_fetch_stat_dbentry(InvalidOid);
>>> + dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
>>>
>>> When using the existing stats, ISTM that these are not necessary and
>>> we can reuse "shared" and "dbentry" obtained before. Right?
>> Yeah, but unless autovac_refresh_stats() is called, these functions
>> read the information from the
>> local hash table without re-read the stats file, so the process is very light.
>> Therefore, I think, it is better to keep the current logic to keep the
>> code simple.
>>
>>>
>>> + /* We might be better to refresh stats */
>>> + use_existing_stats = false;
>>>
>>> I think that we should add more comments about why it's better to
>>> refresh the stats in this case.
>>>
>>> + /* The relid has already vacuumed, so we might be better to use existing stats */
>>> + use_existing_stats = true;
>>>
>>> I think that we should add more comments about why it's better to
>>> reuse the stats in this case.
>> I added comments.
>>
>> Attache the patch.
>>
>
> Thank you for updating the patch. Here are some small comments on the
> latest (v4) patch.
>
> + * So if the last time we checked a table that was already vacuumed after
> + * refres stats, check the current statistics before refreshing it.
> + */
>
> s/refres/refresh/
>
> -----
> +/* Counter to determine if statistics should be refreshed */
> +static bool use_existing_stats = false;
> +
>
> I think 'use_existing_stats' can be declared within table_recheck_autovac().
>
> -----
> While testing the performance, I realized that the statistics are
> reset every time vacuumed one table, leading to re-reading the stats
> file even if 'use_existing_stats' is true. Please refer that vacuum()
> eventually calls AtEOXact_PgStat() which calls to
> pgstat_clear_snapshot().

Good catch!

> I believe that's why the performance of the
> method of always checking the existing stats wasn’t good as expected.
> So if we save the statistics somewhere and use it for rechecking, the
> results of the performance benchmark will differ between these two
> methods.

Or it's simpler to make autovacuum worker skip calling
pgstat_clear_snapshot() in AtEOXact_PgStat()?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-11-30 12:24:29 Re: pgbench - test whether a variable exists
Previous Message Fabien COELHO 2020-11-30 11:53:20 Re: pgbench - test whether a variable exists