Re: [HACKERS] GUC for cleanup indexes threshold.

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: [HACKERS] GUC for cleanup indexes threshold.
Date: 2018-03-27 11:01:21
Message-ID: CAPpHfdvFVOvWaLXEqeuq+jMQbxv2GT-QxZgXAiVG62+FHF=nCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 11:55 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> On Thu, Mar 22, 2018 at 9:24 PM, Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > However, I see that you are comparing relative change of num_heap_tuples
> > before and after vacuum to vacuum_cleanup_index_scale_factor.
> > The problem is that if vacuum is very aggressive (and that is likely for
> > append only case if this patch is committed), then num_heap_tuples
> > changes very slightly every vacuum cycle. So, cleanup would never
> > happen and statistics could stall indefinitely.
>
> Good catch. I think we need to store the number of tuples at when
> scanning whole index is performed (bulkdelete or cleanup) as your
> patch does so. So it also would need the page-upgrading code. Since
> that value would be helpful for other type of indexes too it might be
> better to store it into system catalog.
>
> >
> > Another issue is that append only workloads are frequently combined
> > with rare periodical bulk deletes of data. Assuming that in this patch
> > you don't consider deleted pages to trigger index cleanup, on such
> > workload large amounts of deleted pages may reside non-recycled until
> > next bulk delete cycle.
>
> Perhaps using that new value we can trigger the cleanup if the current
> number of tuples has been increased or decreased the
> vacuum_index_scale_factor% from n_tup_last_cleanup.
>

Yes, that might work. However, decreased number of tuples could be
inaccurate measure of number of deleted pages. Depending on distribution
of tuples per pages, same amount of deleted tuples can lead to very
different
number of deleted pages (in corner cases in can start from zero to the very
large amounts). If we want to skip scanning of deleted pages then it
would be better store their exact count known by previous bulk delete or
cleanup.

>
> > So, despite I generally like idea of storing epoch of deleted xid in the
> > page, I think my version of patch is closer to committable shape.
> >
>
> Agreed, but as I mentioned before, I'm concerned that your version
> patch approach will become a restriction of future improvement. If
> community decides this feature works only for mostly append-only
> workloads I think your version of patch is the best approach for now.

At first I would like to note that all valid optimizations presented in the
thread optimizes append case. Thus they do give serious benefit
on mostly append-only workloads. Since patches were about
skipping/reducing cleanup stage which does serious work only in
append-only case.

It could be possible in future to optimize also cases when only small
fraction of table tuples were deleted. Those tuples could be deleted
not by full index scan, but by individual index lookups. But I think
such optimization is rather harder than everything mentioned in
this thread, and it should be considered separately.

The thing which could be improved in future is making btree able
to skip cleanup when some deleted pages are pending to be recycled.
But I'm not sure that this problem deserves a lot of time investment
right now. Because I think that page deletion in our btree is unfortunately
quite rate situation. In real life our btree is rather going to be bloat
with bad
fillfactor of pages rather than got much pages deleted.

So, I would like to clarify why could my patch block future improvements
in this area? For instance, if we would decide to make btree able to skip
cleanup when some delete pages are pending for recycle, we can add
it in future.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2018-03-27 11:17:20 Ensure that maxlen is an integer value in dict_int configuration
Previous Message David Rowley 2018-03-27 10:51:56 Re: [HACKERS] path toward faster partition pruning