Re: autovacuum: change priority of the vacuumed tables

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>
Cc: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: autovacuum: change priority of the vacuumed tables
Date: 2018-02-16 12:48:14
Message-ID: CAD21AoCHQhsga9GkJWqMqcsmG4eKbcKSncqronWRNxXATphFaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 16, 2018 at 7:50 PM, Ildus Kurbangaliev
<i(dot)kurbangaliev(at)postgrespro(dot)ru> wrote:
> On Fri, 16 Feb 2018 17:42:34 +0900
> Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
>> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
>> <g(dot)smolkin(at)postgrespro(dot)ru> wrote:
>> > On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
>> >
>> >> Hi,
>> >>
>> >> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
>> >> <i(dot)kurbangaliev(at)postgrespro(dot)ru> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Attached patch adds 'autovacuum_table_priority' to the current
>> >>> list of automatic vacuuming settings. It's used in sorting of
>> >>> vacuumed tables in autovacuum worker before actual vacuum.
>> >>>
>> >>> The idea is to give possibility to the users to prioritize their
>> >>> tables in autovacuum process.
>> >>>
>> >> Hmm, I couldn't understand the benefit of this patch. Would you
>> >> elaborate it a little more?
>> >>
>> >> Multiple autovacuum worker can work on one database. So even if a
>> >> table that you want to vacuum first is the back of the list and
>> >> there other worker would pick up it. If the vacuuming the table
>> >> gets delayed due to some big tables are in front of that table I
>> >> think you can deal with it by increasing the number of autovacuum
>> >> workers.
>> >>
>> >> Regards,
>> >>
>> >> --
>> >> Masahiko Sawada
>> >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> >> NTT Open Source Software Center
>> >>
>> >
>> > Database can contain thousands of tables and often updates/deletes
>> > concentrate mostly in only a handful of tables.
>> > Going through thousands of less bloated tables can take ages.
>> > Currently autovacuum know nothing about prioritizing it`s work with
>> > respect to user`s understanding of his data and application.
>>
>> Understood. I have a question; please imagine the following case.
>>
>> Suppose that there are 1000 tables in a database, and one table of
>> them (table-A) has the highest priority while other 999 tables have
>> same priority. Almost tables (say 800 tables) including table-A need
>> to get vacuumed at some point, so with your patch an AV worker listed
>> 800 tables and table-A will be at the head of the list. Table-A will
>> get vacuumed first but this AV worker has to vacuum other 799 tables
>> even if table-A requires vacuum later again.
>>
>> If an another AV worker launches during table-A being vacuumed, the
>> new AV worker would include table-A but would not process it because
>> concurrent AV worker is processing it. So it would vacuum other tables
>> instead. Similarly, this AV worker can not get the new table list
>> until finish to vacuum all other tables. (Note that it might skip some
>> tables if they are already vacuumed by other AV worker.) On the other
>> hand, if another new AV worker launches after table-A got vacuumed and
>> requires vacuuming again, the new AV worker puts the table-A at the
>> head of list. It processes table-A first but, again, it has to vacuum
>> other tables before getting new table list next time that might
>> include table-A.
>>
>> Is this the expected behavior? I'd rather expect postgres to vacuum it
>> before other lower priority tables whenever the table having the
>> highest priority requires vacuuming, but it wouldn't.
>
> Yes, this is the expected behavior. The patch is the way to give the
> user at least some control of the sorting, later it could be extended
> with something more sophisticated.
>

Since user doesn't know that each AV worker processes tables based on
its table list that is different from lists that other worker has, I
think it's hard for user to understand this parameter. I'd say that
user would expect that high priority table can get vacuumed any time.

I think what you want to solve is to vacuum some tables preferentially
if there are many tables requiring vacuuming. Right? If so, I think
the prioritizing table only in the list would not solve the
fundamental issue. In the example, table-A will still need to wait for
other 799 tables to get vacuumed. Table-A will be bloating during
vacuuming other tables. To deal with it, I think we need something
queue on the shmem per database in order to control the order of
tables waiting for vacuuming and need to use it with a smart
algorithm. Thoughts?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anna Akenteva 2018-02-16 14:23:50 [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Previous Message tushar 2018-02-16 11:38:33 Re: After an error - pg_replication_slot is dropped