Re: Vacuum: allow usage of more than 1GB of work mem

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2017-04-08 01:06:13
Message-ID: CAGTBQpayevKU-=cAh4UhD-0Ep3W1O4Xhaq+nz78BNWEQ1sFSSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 7, 2017 at 9:56 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
>
> On 2017-04-07 19:43:39 -0300, Claudio Freire wrote:
>> On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > Hi,
>> >
>> > I've *not* read the history of this thread. So I really might be
>> > missing some context.
>> >
>> >
>> >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
>> >> From: Claudio Freire <klaussfreire(at)gmail(dot)com>
>> >> Date: Mon, 12 Sep 2016 23:36:42 -0300
>> >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
>> >>
>> >> Turn the dead_tuples array into a structure composed of several
>> >> exponentially bigger arrays, to enable usage of more than 1GB
>> >> of work mem during vacuum and thus reduce the number of full
>> >> index scans necessary to remove all dead tids when the memory is
>> >> available.
>> >
>> >> * We are willing to use at most maintenance_work_mem (or perhaps
>> >> * autovacuum_work_mem) memory space to keep track of dead tuples. We
>> >> - * initially allocate an array of TIDs of that size, with an upper limit that
>> >> + * initially allocate an array of TIDs of 128MB, or an upper limit that
>> >> * depends on table size (this limit ensures we don't allocate a huge area
>> >> - * uselessly for vacuuming small tables). If the array threatens to overflow,
>> >> - * we suspend the heap scan phase and perform a pass of index cleanup and page
>> >> - * compaction, then resume the heap scan with an empty TID array.
>> >> + * uselessly for vacuuming small tables). Additional arrays of increasingly
>> >> + * large sizes are allocated as they become necessary.
>> >> + *
>> >> + * The TID array is thus represented as a list of multiple segments of
>> >> + * varying size, beginning with the initial size of up to 128MB, and growing
>> >> + * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
>> >> + * is used up.
>> >
>> > When the chunk size is 128MB, I'm a bit unconvinced that using
>> > exponential growth is worth it. The allocator overhead can't be
>> > meaningful in comparison to collecting 128MB dead tuples, the potential
>> > waste is pretty big, and it increases memory fragmentation.
>>
>> The exponential strategy is mainly to improve lookup time (ie: to
>> avoid large segment lists).
>
> Well, if we were to do binary search on the segment list, that'd not be
> necessary.

True, but the initial lookup might be slower in the end, since the
array would be bigger and cache locality worse.

Why do you say exponential growth fragments memory? AFAIK, all those
allocations are well beyond the point where malloc starts mmaping
memory, so each of those segments should be a mmap segment,
independently freeable.

>> >> + if (seg->num_dead_tuples >= seg->max_dead_tuples)
>> >> + {
>> >> + /*
>> >> + * The segment is overflowing, so we must allocate a new segment.
>> >> + * We could have a preallocated segment descriptor already, in
>> >> + * which case we just reinitialize it, or we may need to repalloc
>> >> + * the vacrelstats->dead_tuples array. In that case, seg will no
>> >> + * longer be valid, so we must be careful about that. In any case,
>> >> + * we must update the last_dead_tuple copy in the overflowing
>> >> + * segment descriptor.
>> >> + */
>> >> + Assert(seg->num_dead_tuples == seg->max_dead_tuples);
>> >> + seg->last_dead_tuple = seg->dt_tids[seg->num_dead_tuples - 1];
>> >> + if (vacrelstats->dead_tuples.last_seg + 1 >= vacrelstats->dead_tuples.num_segs)
>> >> + {
>> >> + int new_num_segs = vacrelstats->dead_tuples.num_segs * 2;
>> >> +
>> >> + vacrelstats->dead_tuples.dt_segments = (DeadTuplesSegment *) repalloc(
>> >> + (void *) vacrelstats->dead_tuples.dt_segments,
>> >> + new_num_segs * sizeof(DeadTuplesSegment));
>> >
>> > Might be worth breaking this into some sub-statements, it's quite hard
>> > to read.
>>
>> Breaking what precisely? The comment?
>
> No, the three-line statement computing the new value of
> dead_tuples.dt_segments. I'd at least assign dead_tuples to a local
> variable, to cut the length of the statement down.

Ah, alright. Will try to do that.

>> >> +/*
>> >> * lazy_tid_reaped() -- is a particular tid deletable?
>> >> *
>> >> * This has the right signature to be an IndexBulkDeleteCallback.
>> >> *
>> >> - * Assumes dead_tuples array is in sorted order.
>> >> + * Assumes the dead_tuples multiarray is in sorted order, both
>> >> + * the segment list and each segment itself, and that all segments'
>> >> + * last_dead_tuple fields up to date
>> >> */
>> >> static bool
>> >> lazy_tid_reaped(ItemPointer itemptr, void *state)
>> >
>> > Have you done performance evaluation about potential performance
>> > regressions in big indexes here? IIRC this can be quite frequently
>> > called?
>>
>> Yes, the benchmarks are upthread. The earlier runs were run on my
>> laptop and made little sense, so I'd ignore them as inaccurate. The
>> latest run[1] with a pgbench scale of 4000 gave an improvement in CPU
>> time (ie: faster) of about 20%. Anastasia did another one[2] and saw
>> improvements as well, roughly 30%, though it's not measuring CPU time
>> but rather elapsed time.
>
> I'd be more concerned about cases that'd already fit into memory, not ones
> where we avoid doing another scan - and I think you mostly measured that?
>
> - Andres

Well, scale 400 is pretty much as big as you can get with the old 1GB
limit, and also suffered no significant regression. Although, true, id
didn't significantly improve either.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-04-08 01:12:52 Re: Vacuum: allow usage of more than 1GB of work mem
Previous Message Andres Freund 2017-04-08 00:56:29 Re: Vacuum: allow usage of more than 1GB of work mem