Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently
Date: 2018-02-26 09:00:28
Message-ID: CAD21AoC2h9RmJ-S-4UpB4rZqZSYNwusHceK6ijKKmzFnq4s0kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 20, 2018 at 5:04 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Fri, Feb 16, 2018 at 5:00 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> On Thu, Feb 15, 2018 at 4:47 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>>> On Wed, Feb 14, 2018 at 3:59 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>>>>> The final FSM vacuum pass isn't partial, to finally correct all those
>>>>> small inconsistencies.
>>>>
>>>> Yes, but the purpose of this patch is to prevent table bloating from
>>>> concurrent modifications?
>>>
>>> Yes, by *marking* unmarked space. If the slot is above the threshold,
>>> it means there's already marked free space on that branch, and search
>>> will go into it already, so no pressing need to refine the mark. The
>>> space already marked can serve while vacuum makes further progress.
>>> Ie: it can wait.
>>
>> Although... I think I see what you mean.
>>
>> If you have this:
>>
>> 255
>> . 0
>> . . 0
>> . . 255
>> . 0
>> . . 64
>> . . 64
>> . 0
>> . . 0
>> . . 64
>>
>>
>> A partial vacuum won't even recurse beyond the root node, so it won't
>> expose the free space 2 levels down.
>
> Yeah, that's what I meant. I think this might be able to happen
> slightly easily if a tables has fillfactor < 100. For example,
> updating tuples on pages that are almost packed except for fillfactor
> with the more bigger value
>
>>
>> This could be arrived at by having an almost fully packed table that
>> contains some free space near the end, and it gets deletions near the
>> beginning. Vacuum will mark the leaf levels at the beginning of the
>> table, and we'd like for that free space to be discoverable to avoid
>> packing more rows near the end where they prevent truncation.
>>
>> The patch as it is now will only vacuum that part of the FSM when the
>> root value drops, which usually requires all the free space on that
>> region of the heap to be exhausted.
>>
>> With the current recursive FSM vacuum method, however, that's
>> unavoidable. We can't detect that there's some free space below the
>> current level to be exposed without recursing into the tree, and
>> recursing is exactly the kind of work we want to prevent with tree
>> pruning.
>>
>> In all my trials, however, I did not run into that case. It must not
>> be easy to get into that situation, because the root node already has
>> ~4000 slots in it. Each of those 4000 slots should be in that
>> situation to keep the space at the end of the table as the sole
>> discoverable free space, which seems like a rather contorted worst
>> case.
>
> Okay, I guess that this patch cannot help in the case where the
> freespace of pages shown on fsm gets decreased by vacuum because the
> upper-level node doesn't reflect the value on the lower page. However
> it doesn't happen often as you said and it's the same as it is even
> though it happens. So I also think it would not become a problem.
>
> I'll review v4 patch.
>

Here is review comment for v4 patch.

@@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel,
LVRelStats *vacrelstats)
* We don't insert a vacuum delay point here, because we have an
* exclusive lock on the table which we want to hold
for as short a
* time as possible. We still need to check for
interrupts however.
+ * We might have to acquire the autovacuum lock,
however, but that
+ * shouldn't pose a deadlock risk.
*/
CHECK_FOR_INTERRUPTS();

Is this change necessary?

----
+ /*
+ * If there are no indexes then we should periodically
vacuum the FSM
+ * on huge relations to make free space visible early.
+ */
+ if (nindexes == 0 &&
+ (vacuumed_pages - vacuumed_pages_at_fsm_vac) >
vacuum_fsm_every_pages)
+ {
+ /* Vacuum the Free Space Map */
+ FreeSpaceMapVacuum(onerel, max_freespace);
+ vacuumed_pages_at_fsm_vac = vacuumed_pages;
+ max_freespace = 0;
+ }

I think this code block should be executed before we check if the page
is whether new or empty and then do 'continue'. Otherwise we cannot
reach this code if the table has a lot of new or empty pages.

----
@@ -785,7 +789,7 @@ fsm_search(Relation rel, uint8 min_cat)
* Recursive guts of FreeSpaceMapVacuum
*/
static uint8
-fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
+fsm_vacuum_page(Relation rel, FSMAddress addr, uint8 threshold, bool *eof_p)
{
Buffer buf;
Page page;

I think the comment for 'threshold' is needed. Maybe for
FreeSpaceMapvacuum as well?

----
@@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
* If minValue > 0, the updated page is also searched for a page with at
* least minValue of free space. If one is found, its slot number is
* returned, -1 otherwise.
+ *
+ * If minValue == 0, the value at the root node is returned.
*/
static int
fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
@@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr,
uint16 slot,

addr.level == FSM_BOTTOM_LEVEL,
true);
}
+ else
+ newslot = fsm_get_avail(page, 0);

I think it's not good idea to make fsm_set_and_search return either a
slot number or a category value according to the argument. Category
values is actually uint8 but this function returns it as int. Also we
can call fsm_set_and_search with minValue = 0 at
RecordAndGetPageWithFreeSpace() when search_cat is 0. With you patch,
fsm_set_and_search() then returns the root value but it's not correct.

Also, is this change necessary for this patch? ISTM this change is
used for the change in fsm_update_recursive() as follows but I guess
this change can be a separate patch.

+ new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
+
+ /*
+ * Bubble up, not the value we just set, but the one now in the root
+ * node of the just-updated page, which is the page's highest value.
+ */

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-02-26 09:19:21 Re: [bug fix] Cascaded standby cannot start after a clean shutdown
Previous Message Lætitia Avrot 2018-02-26 08:56:32 Re: VACUUM FULL name is very confusing to some people (or to most non expert people)