Re: [HACKERS] GUC for cleanup indexes threshold.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: [HACKERS] GUC for cleanup indexes threshold.
Date: 2018-04-03 15:42:52
Message-ID: CAD21AoBX0YD=furvXH05Db3LL9CFYd=ViV1rAi4GwipR629weA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for the late response.

On Tue, Mar 27, 2018 at 8:01 PM, Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> On Tue, Mar 27, 2018 at 11:55 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
>>
>> On Thu, Mar 22, 2018 at 9:24 PM, Alexander Korotkov
>> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>> > However, I see that you are comparing relative change of num_heap_tuples
>> > before and after vacuum to vacuum_cleanup_index_scale_factor.
>> > The problem is that if vacuum is very aggressive (and that is likely for
>> > append only case if this patch is committed), then num_heap_tuples
>> > changes very slightly every vacuum cycle. So, cleanup would never
>> > happen and statistics could stall indefinitely.
>>
>> Good catch. I think we need to store the number of tuples at when
>> scanning whole index is performed (bulkdelete or cleanup) as your
>> patch does so. So it also would need the page-upgrading code. Since
>> that value would be helpful for other type of indexes too it might be
>> better to store it into system catalog.
>>
>> >
>> > Another issue is that append only workloads are frequently combined
>> > with rare periodical bulk deletes of data. Assuming that in this patch
>> > you don't consider deleted pages to trigger index cleanup, on such
>> > workload large amounts of deleted pages may reside non-recycled until
>> > next bulk delete cycle.
>>
>> Perhaps using that new value we can trigger the cleanup if the current
>> number of tuples has been increased or decreased the
>> vacuum_index_scale_factor% from n_tup_last_cleanup.
>
>
> Yes, that might work. However, decreased number of tuples could be
> inaccurate measure of number of deleted pages. Depending on distribution
> of tuples per pages, same amount of deleted tuples can lead to very
> different
> number of deleted pages (in corner cases in can start from zero to the very
> large amounts). If we want to skip scanning of deleted pages then it
> would be better store their exact count known by previous bulk delete or
> cleanup.
>
>> >
>> > So, despite I generally like idea of storing epoch of deleted xid in the
>> > page, I think my version of patch is closer to committable shape.
>> >
>>
>> Agreed, but as I mentioned before, I'm concerned that your version
>> patch approach will become a restriction of future improvement. If
>> community decides this feature works only for mostly append-only
>> workloads I think your version of patch is the best approach for now.
>
>
> At first I would like to note that all valid optimizations presented in the
> thread optimizes append case. Thus they do give serious benefit
> on mostly append-only workloads. Since patches were about
> skipping/reducing cleanup stage which does serious work only in
> append-only case.
>
> It could be possible in future to optimize also cases when only small
> fraction of table tuples were deleted. Those tuples could be deleted
> not by full index scan, but by individual index lookups. But I think
> such optimization is rather harder than everything mentioned in
> this thread, and it should be considered separately.

Agreed.

>
> The thing which could be improved in future is making btree able
> to skip cleanup when some deleted pages are pending to be recycled.
> But I'm not sure that this problem deserves a lot of time investment
> right now. Because I think that page deletion in our btree is unfortunately
> quite rate situation. In real life our btree is rather going to be bloat
> with bad
> fillfactor of pages rather than got much pages deleted.

Agreed. In your version patch we cleanup recyclable pages if there
might be them whereas my version patch could leave recyclable pages.
The thing I'm concerned is that the patch unnecessarily would narrow
its effect. That is, we might be able to update the btree meta page
even when bulkdelete.

In your version patch we have to scan whole index at least twice
(bulkdelete and cleanup) if even one tuple is deleted. But if deletion
of index tuples didn't lead index page deletions (which is a common
case) the extra one scan is really unnecessary. So I guess that we
can update the meta page only when we deleted pages in the index
vacuum scan. If no page deletion happened since we don't need to
update the oldest btpo.xact we don't update meta page, and of course
don't WAL-logging. This can be separate patch though. At least the
current approach is more safer.

>
> So, I would like to clarify why could my patch block future improvements
> in this area? For instance, if we would decide to make btree able to skip
> cleanup when some delete pages are pending for recycle, we can add
> it in future.
>

Anyway, for approaches of this feature I agree your version patch and
your patch looks good to me as the first step of this feature.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2018-04-03 16:05:19 Re: Optimize Arm64 crc32c implementation in Postgresql
Previous Message Andres Freund 2018-04-03 15:32:45 Re: pgsql: New files for MERGE