Re: [HACKERS] GUC for cleanup indexes threshold.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: [HACKERS] GUC for cleanup indexes threshold.
Date: 2018-03-27 08:55:45
Message-ID: CAD21AoD4fRnPeOAjoLHGO-aijPt32Cye7ZpcrkJJvSBXvvcK1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 22, 2018 at 9:24 PM, Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> On Mon, Mar 19, 2018 at 5:12 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
>>
>> On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov
>> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>> > On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
>> > wrote:
>> >>
>> >> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
>> >> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>> >> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada
>> >> > <sawada(dot)mshk(at)gmail(dot)com>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
>> >> >> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>> >> >> > 2) These parameters are reset during btbulkdelete() and set during
>> >> >> > btvacuumcleanup().
>> >> >>
>> >> >> Can't we set these parameters even during btbulkdelete()? By keeping
>> >> >> them up to date, we will able to avoid an unnecessary cleanup
>> >> >> vacuums
>> >> >> even after index bulk-delete.
>> >> >
>> >> >
>> >> > We certainly can update cleanup-related parameters during
>> >> > btbulkdelete().
>> >> > However, in this case we would update B-tree meta-page during each
>> >> > VACUUM cycle. That may cause some overhead for non append-only
>> >> > workloads. I don't think this overhead would be sensible, because in
>> >> > non append-only scenarios VACUUM typically writes much more of
>> >> > information.
>> >> > But I would like this oriented to append-only workload patch to be
>> >> > as harmless as possible for other workloads.
>> >>
>> >> What overhead are you referring here? I guess the overhead is only the
>> >> calculating the oldest btpo.xact. And I think it would be harmless.
>> >
>> >
>> > I meant overhead of setting last_cleanup_num_heap_tuples after every
>> > btbulkdelete with wal-logging of meta-page. I bet it also would be
>> > harmless, but I think that needs some testing.
>>
>> Agreed.
>>
>> After more thought, it might be too late but we can consider the
>> possibility of another idea proposed by Peter. Attached patch
>> addresses the original issue of index cleanups by storing the epoch
>> number of page deletion XID into PageHeader->pd_prune_xid which is
>> 4byte field. Comparing to the current proposed patch this patch
>> doesn't need neither the page upgrade code nor extra WAL-logging. If
>> we also want to address cases other than append-only case we will
>> require the bulk-delete method of scanning whole index and of logging
>> WAL. But it leads some extra overhead. With this patch we no longer
>> need to depend on the full scan on b-tree index. This might be useful
>> for a future when we make the bulk-delete of b-tree index not scan
>> whole index.
>
>
> Storing epoch in deleted pages is good by itself, because it makes us
> free to select the moment of time to recycle those pages without risk
> of wraparound.
>
> However, I see that you are comparing relative change of num_heap_tuples
> before and after vacuum to vacuum_cleanup_index_scale_factor.
> The problem is that if vacuum is very aggressive (and that is likely for
> append only case if this patch is committed), then num_heap_tuples
> changes very slightly every vacuum cycle. So, cleanup would never
> happen and statistics could stall indefinitely.

Good catch. I think we need to store the number of tuples at when
scanning whole index is performed (bulkdelete or cleanup) as your
patch does so. So it also would need the page-upgrading code. Since
that value would be helpful for other type of indexes too it might be
better to store it into system catalog.

>
> Another issue is that append only workloads are frequently combined
> with rare periodical bulk deletes of data. Assuming that in this patch
> you don't consider deleted pages to trigger index cleanup, on such
> workload large amounts of deleted pages may reside non-recycled until
> next bulk delete cycle.

Perhaps using that new value we can trigger the cleanup if the current
number of tuples has been increased or decreased the
vacuum_index_scale_factor% from n_tup_last_cleanup.

>
> So, despite I generally like idea of storing epoch of deleted xid in the
> page, I think my version of patch is closer to committable shape.
>

Agreed, but as I mentioned before, I'm concerned that your version
patch approach will become a restriction of future improvement. If
community decides this feature works only for mostly append-only
workloads I think your version of patch is the best approach for now.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-03-27 09:06:33 Re: PATCH: pgbench - option to build using ppoll() for larger connection counts
Previous Message Alexander Korotkov 2018-03-27 08:45:32 Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index