|From:||Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>|
|Cc:||a(dot)korotkov(at)postgrespro(dot)ru, simon(at)2ndquadrant(dot)com, y(dot)sokolov(at)postgrespro(dot)ru, pg(at)bowt(dot)ie, andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, david(at)pgmasters(dot)net, amit(dot)kapila16(at)gmail(dot)com, ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, pgsql-hackers(at)postgresql(dot)org, kuntalghosh(dot)2007(at)gmail(dot)com, pgsql-hackers-owner(at)postgresql(dot)org|
|Subject:||Re: [HACKERS] GUC for cleanup indexes threshold.|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
At Mon, 19 Mar 2018 20:50:48 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCfWXcX-po8Q1r779nyVGzs01pwpSLM=u7Sx3Hv+L+4gg(at)mail(dot)gmail(dot)com>
> On Mon, Mar 19, 2018 at 2:45 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > At Mon, 19 Mar 2018 11:12:58 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAB8tQg9xwojupUJjKD=fMhtx6thDEPENDdhftVLWcR8A(at)mail(dot)gmail(dot)com>
> >> On Wed, Mar 14, 2018 at 9:25 PM, Alexander Korotkov
> >> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> >> > On Wed, Mar 14, 2018 at 7:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> >> > wrote:
> >> >>
> >> >> On Sat, Mar 10, 2018 at 3:40 AM, Alexander Korotkov
> >> >> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> >> >> > On Fri, Mar 9, 2018 at 3:12 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Fri, Mar 9, 2018 at 8:43 AM, Alexander Korotkov
> >> >> >> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> >> >> >> > 2) These parameters are reset during btbulkdelete() and set during
> >> >> >> > btvacuumcleanup().
> >> >> >>
> >> >> >> Can't we set these parameters even during btbulkdelete()? By keeping
> >> >> >> them up to date, we will able to avoid an unnecessary cleanup vacuums
> >> >> >> even after index bulk-delete.
> >> >> >
> >> >> >
> >> >> > We certainly can update cleanup-related parameters during
> >> >> > btbulkdelete().
> >> >> > However, in this case we would update B-tree meta-page during each
> >> >> > VACUUM cycle. That may cause some overhead for non append-only
> >> >> > workloads. I don't think this overhead would be sensible, because in
> >> >> > non append-only scenarios VACUUM typically writes much more of
> >> >> > information.
> >> >> > But I would like this oriented to append-only workload patch to be
> >> >> > as harmless as possible for other workloads.
> >> >>
> >> >> What overhead are you referring here? I guess the overhead is only the
> >> >> calculating the oldest btpo.xact. And I think it would be harmless.
> >> >
> >> >
> >> > I meant overhead of setting last_cleanup_num_heap_tuples after every
> >> > btbulkdelete with wal-logging of meta-page. I bet it also would be
> >> > harmless, but I think that needs some testing.
> >> Agreed.
> >> After more thought, it might be too late but we can consider the
> >> possibility of another idea proposed by Peter. Attached patch
> >> addresses the original issue of index cleanups by storing the epoch
> >> number of page deletion XID into PageHeader->pd_prune_xid which is
> >> 4byte field.
> > Mmm. It seems to me that the story is returning to the
> > beginning. Could I try retelling the story?
> > I understant that the initial problem was vacuum runs apparently
> > unnecessary full-scan on indexes many times. The reason for that
> > is the fact that a cleanup scan may leave some (or many under
> > certain condition) dead pages not-recycled but we don't know
> > whether a cleanup is needed or not. They will be staying left
> > forever unless we run additional cleanup-scans at the appropriate
> > timing.
> > (If I understand it correctly,) Sawada-san's latest proposal is
> > (fundamentally the same to the first one,) just skipping the
> > cleanup scan if the vacuum scan just before found that the number
> > of *live* tuples are increased. If there where many deletions and
> > insertions but no increase of total number of tuples, we don't
> > have a cleanup. Consequently it had a wraparound problem and it
> > is addressed in this version.
> No, it doesn't have a wraparound problem. The patch based on Peter's
> idea I proposed adds an epoch number of page deletion xid and compare
> them when we judge whether the page is recyclable or not. It's
> something like we create 64-bit xid of deletion xid. Also, if there is
> even one deletion the bulk-delete will be performed instead of the
> index cleanup. So with this patch we do the index cleanup only when
> the reltuple of table is increased by fraction of
> vacuum_index_cleanup_scale_factor from previous stats. It doesn't need
> to do the index cleanup by any xid thresholds.
Perhaps you took me wrong. I know the last patch doesn't have (or
at least intends to get rid of ) the problem, and I wrote that
the problem was introduced by your *first* patch.
> > (ditto.) Alexander proposed to record the oldest xid of
> > recyclable pages in metapage (and the number of tuples at the
> > last cleanup). This prevents needless cleanup scan and surely
> > runs cleanups to remove all recyclable pages.
> Yes, but the concerns we discussed are that we need extra WAL-logging
> for updating the metapage and it works only for append-only case. If
> we also want to support more cases we will need to update the metapage
> during bulk-delete. The overhead of WAL-logging would be harmless but
> should be tested as Alexander mentioned.
> > I think that we can accept Sawada-san's proposal if we accept the
> > fact that indexes can retain recyclable pages for a long
> > time. (Honestly I don't think so.)
> > If (as I might have mentioned as the same upthread for Yura's
> > patch,) we accept to hold the information on index meta page,
> > Alexander's way would be preferable. The difference betwen Yura's
> > and Alexander's is the former runs cleanup scan if a recyclable
> > page is present but the latter avoids that before any recyclable
> > pages are knwon to be removed.
> >> Comparing to the current proposed patch this patch
> >> doesn't need neither the page upgrade code nor extra WAL-logging. If
> > # By the way, my proposal was storing the information as Yura
> > # proposed into stats collector. The information maybe be
> > # available a bit lately, but it doesn't harm. This doesn't need
> > # extra WAL logging nor the upgrad code:p
> >> we also want to address cases other than append-only case we will
> > I'm afraid that "the problem for the other cases" is a new one
> > that this patch introduces, not an existing one.
> I meant that the current Alexandor's proposal works for append-only
> table. If we want to support other cases we have to update metapage
> during bulk-delete, which assumes that bulk-delete always scans whole
True. Currently no patches so far gets rid of the whole-cleanup-scan.
> >> require the bulk-delete method of scanning whole index and of logging
> >> WAL. But it leads some extra overhead. With this patch we no longer
> >> need to depend on the full scan on b-tree index. This might be useful
> >> for a future when we make the bulk-delete of b-tree index not scan
> >> whole index.
> > Perhaps I'm taking something incorrectly, but is it just the
> > result of skipping 'maybe needed' scans without condiering the
> > actual necessity?
> I meant to scan only index pages that are relevant with garbages TIDs
> on a table. The current b-tree index bulk-deletion is very slow and
> heavy because we always scans the whole index even if there is only 1
> dead tuples in a table. To address this problem I'm thinking a way to
> make bulk-delete not scan whole index if there is a few dead tuples in
> a table. That is, we do index scans to collect the stack of btree
> pages and reclaim garbage. Maybe we will full index scan if there are
> a lot of dead tuples, which would be same as what we're doing on
> planning access paths.
Yeah, that seems good! A possible problem of that is that the
pages we want to recycle in a cleanup scan can *not* be only them
that have found to be recyclable in the vacuum-scan just
before. When we leave some recyclable pages in a cleanup scan, we
should do whole-scan at the next chance if we don't have the TID
list (or in other smaller form, or just the number of recyclable
pages?) at the time.
NTT Open Source Software Center
|Next Message||Kyotaro HORIGUCHI||2018-03-20 05:03:56||Re: [HACKERS] GUC for cleanup indexes threshold.|
|Previous Message||Amit Langote||2018-03-20 04:39:26||Re: ON CONFLICT DO UPDATE for partitioned tables|