Re: Vacuum: allow usage of more than 1GB of work mem

From: Andres Freund <andres(at)anarazel(dot)de>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2017-04-08 00:56:29
Message-ID: 20170408005629.bhjzow3ittre7vqc@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2017-04-07 19:43:39 -0300, Claudio Freire wrote:
> On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Hi,
> >
> > I've *not* read the history of this thread. So I really might be
> > missing some context.
> >
> >
> >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
> >> From: Claudio Freire <klaussfreire(at)gmail(dot)com>
> >> Date: Mon, 12 Sep 2016 23:36:42 -0300
> >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
> >>
> >> Turn the dead_tuples array into a structure composed of several
> >> exponentially bigger arrays, to enable usage of more than 1GB
> >> of work mem during vacuum and thus reduce the number of full
> >> index scans necessary to remove all dead tids when the memory is
> >> available.
> >
> >> * We are willing to use at most maintenance_work_mem (or perhaps
> >> * autovacuum_work_mem) memory space to keep track of dead tuples. We
> >> - * initially allocate an array of TIDs of that size, with an upper limit that
> >> + * initially allocate an array of TIDs of 128MB, or an upper limit that
> >> * depends on table size (this limit ensures we don't allocate a huge area
> >> - * uselessly for vacuuming small tables). If the array threatens to overflow,
> >> - * we suspend the heap scan phase and perform a pass of index cleanup and page
> >> - * compaction, then resume the heap scan with an empty TID array.
> >> + * uselessly for vacuuming small tables). Additional arrays of increasingly
> >> + * large sizes are allocated as they become necessary.
> >> + *
> >> + * The TID array is thus represented as a list of multiple segments of
> >> + * varying size, beginning with the initial size of up to 128MB, and growing
> >> + * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
> >> + * is used up.
> >
> > When the chunk size is 128MB, I'm a bit unconvinced that using
> > exponential growth is worth it. The allocator overhead can't be
> > meaningful in comparison to collecting 128MB dead tuples, the potential
> > waste is pretty big, and it increases memory fragmentation.
>
> The exponential strategy is mainly to improve lookup time (ie: to
> avoid large segment lists).

Well, if we were to do binary search on the segment list, that'd not be
necessary.

> >> + if (seg->num_dead_tuples >= seg->max_dead_tuples)
> >> + {
> >> + /*
> >> + * The segment is overflowing, so we must allocate a new segment.
> >> + * We could have a preallocated segment descriptor already, in
> >> + * which case we just reinitialize it, or we may need to repalloc
> >> + * the vacrelstats->dead_tuples array. In that case, seg will no
> >> + * longer be valid, so we must be careful about that. In any case,
> >> + * we must update the last_dead_tuple copy in the overflowing
> >> + * segment descriptor.
> >> + */
> >> + Assert(seg->num_dead_tuples == seg->max_dead_tuples);
> >> + seg->last_dead_tuple = seg->dt_tids[seg->num_dead_tuples - 1];
> >> + if (vacrelstats->dead_tuples.last_seg + 1 >= vacrelstats->dead_tuples.num_segs)
> >> + {
> >> + int new_num_segs = vacrelstats->dead_tuples.num_segs * 2;
> >> +
> >> + vacrelstats->dead_tuples.dt_segments = (DeadTuplesSegment *) repalloc(
> >> + (void *) vacrelstats->dead_tuples.dt_segments,
> >> + new_num_segs * sizeof(DeadTuplesSegment));
> >
> > Might be worth breaking this into some sub-statements, it's quite hard
> > to read.
>
> Breaking what precisely? The comment?

No, the three-line statement computing the new value of
dead_tuples.dt_segments. I'd at least assign dead_tuples to a local
variable, to cut the length of the statement down.

> >> + while (vacrelstats->dead_tuples.num_segs < new_num_segs)
> >> + {
> >> + /* Initialize as "unallocated" */
> >> + DeadTuplesSegment *nseg = &(vacrelstats->dead_tuples.dt_segments[
> >> + vacrelstats->dead_tuples.num_segs]);
> >
> > dito.
>
> I don't really get what you're asking here.

Trying to simplify/shorten the statement.

> >> +/*
> >> * lazy_tid_reaped() -- is a particular tid deletable?
> >> *
> >> * This has the right signature to be an IndexBulkDeleteCallback.
> >> *
> >> - * Assumes dead_tuples array is in sorted order.
> >> + * Assumes the dead_tuples multiarray is in sorted order, both
> >> + * the segment list and each segment itself, and that all segments'
> >> + * last_dead_tuple fields up to date
> >> */
> >> static bool
> >> lazy_tid_reaped(ItemPointer itemptr, void *state)
> >
> > Have you done performance evaluation about potential performance
> > regressions in big indexes here? IIRC this can be quite frequently
> > called?
>
> Yes, the benchmarks are upthread. The earlier runs were run on my
> laptop and made little sense, so I'd ignore them as inaccurate. The
> latest run[1] with a pgbench scale of 4000 gave an improvement in CPU
> time (ie: faster) of about 20%. Anastasia did another one[2] and saw
> improvements as well, roughly 30%, though it's not measuring CPU time
> but rather elapsed time.

I'd be more concerned about cases that'd already fit into memory, not ones
where we avoid doing another scan - and I think you mostly measured that?

- Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2017-04-08 01:06:13 Re: Vacuum: allow usage of more than 1GB of work mem
Previous Message Tom Lane 2017-04-08 00:41:28 Re: Partitioned tables vs GRANT