Quick Links

Re: Vacuum: allow usage of more than 1GB of work mem

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum: allow usage of more than 1GB of work mem
Date:	2017-07-10 23:04:42
Message-ID:	CAGTBQpbS2jDH1X8UYdqNpLUveyTsH8rioGhiGx1DhpiJh6jy8A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Sorry for the delay, I had extended vacations that kept me away from
my test rigs, and afterward testing too, liteally, a few weeks.

I built a more thoroguh test script that produced some interesting
results. Will attach the results.

For now, to the review comments:

On Thu, Apr 27, 2017 at 4:25 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> I've read this patch again and here are some review comments.
>
> + * Lookup in that structure proceeds sequentially in the list of segments,
> + * and with a binary search within each segment. Since segment's size grows
> + * exponentially, this retains O(log N) lookup complexity (2 log N to be
> + * precise).
>
> IIUC we now do binary search even over the list of segments.

Right

>
> -----
>
> We often fetch a particular dead tuple segment. How about providing a
> macro for easier understanding?
> For example,
>
> #define GetDeadTuplsSegment(lvrelstats, seg) \
> (&(lvrelstats)->dead_tuples.dt_segments[(seg)])
>
> -----
>
> + if (vacrelstats->dead_tuples.num_segs == 0)
> + return;
> +
>
> + /* If uninitialized, we have no tuples to delete from the indexes */
> + if (vacrelstats->dead_tuples.num_segs == 0)
> + {
> + return;
> + }
>
> + if (vacrelstats->dead_tuples.num_segs == 0)
> + return false;
> +

> As I listed, there is code to check if dead tuple is initialized
> already in some places where doing actual vacuum.
> I guess that it should not happen that we attempt to vacuum a
> table/index page while not having any dead tuple. Is it better to have
> Assert or ereport instead?

I'm not sure. Having a non-empty dead tuples array is not necessary to
be able to honor the contract in the docstring. Most of those functions
clean up the heap/index of dead tuples given the array of dead tuples,
which is a no-op for an empty array.

The code that calls those functions doesn't bother calling if the array
is known empty, true, but there's no compelling reason to enforce that at the
interface. Doing so could cause subtle bugs rather than catch them
(in the form of unexpected assertion failures, if some caller forgot to
check the dead tuples array for emptiness).

If you're worried about the possibility that some bugs fails to record
dead tuples in the array, and thus makes VACUUM silently ineffective,
I instead added a test for that case. This should be a better approach,
since it's more likely to catch unexpected failure modes than an assert.

> @@ -1915,2 +2002,2 @@ count_nondeletable_pages(Relation onerel,
> LVRelStats *vacrelstats)
> - BlockNumber prefetchStart;
> - BlockNumber pblkno;
> + BlockNumber prefetchStart;
> + BlockNumber pblkno;
>
> I think that it's a unnecessary change.

Yep. But funnily that's how it's now in master.

>
> -----
>
> + /* Search for the segment likely to contain the item pointer */
> + iseg = vac_itemptr_binsrch(
> + (void *) itemptr,
> + (void *)
> &(vacrelstats->dead_tuples.dt_segments->last_dead_tuple),
> + vacrelstats->dead_tuples.last_seg + 1,
> + sizeof(DeadTuplesSegment));
> +
>
> I think that we can change the above to;
>
> + /* Search for the segment likely to contain the item pointer */
> + iseg = vac_itemptr_binsrch(
> + (void *) itemptr,
> + (void *) &(seg->last_dead_tuple),
> + vacrelstats->dead_tuples.last_seg + 1,
> + sizeof(DeadTuplesSegment));
>
> We set "seg = vacrelstats->dead_tuples.dt_segments" just before this.

Right

Attached is a current version of both patches, rebased since we're at it.

I'm also attaching the output from the latest benchmark runs, in raw
(tar.bz2) and digested (bench_report) forms, the script used to run
them (vacuumbench.sh) and to produce the reports
(vacuum_bench_report.sh).

Those are before the changes in the review. While I don't expect any
change, I'll re-run some of them just in case, and try to investigate
the slowdown. But that will take forever. Each run takes about a week
on my test rig, and I don't have enough hardware to parallelize the
tests. I will run a test on a snapshot of a particularly troublesome
production database we have, that should be interesting.

The benchmarks show a consistent improvement at scale 400, which may
be related to the search implementation being better somehow, and a
slowdown at scale 4000 in some variants. I believe this is due to
those variants having highly clustered indexes. While the "shuf"
(shuffled) variantes were intended to be the opposite of that, I
suspect I somehow failed to get the desired outcome, so I'll be
double-checking that.

In any case the slowdown is only materialized when vacuuming with a
large mwm setting, which is something that shouldn't happen
unintentionally.

Attachment	Content-Type	Size
0002-Vacuum-allow-using-more-than-1GB-work-mem-v11.patch	text/x-patch	25.8 KB
0003-Vacuum-free-dead-tuples-array-as-early-as-possible-v4.patch	text/x-patch	2.7 KB
vacuum_bench_report_unpatched.log	text/x-log	9.1 KB
vacuum_bench_report_v11.log	text/x-log	9.1 KB
vacuum_bench_output_unpatched.tar.bz2	application/x-bzip2	1023.9 KB
vacuum_bench_output_v11.tar.bz2	application/x-bzip2	1.0 MB
vacuum_bench_report.sh	application/x-sh	606 bytes
vacuumbench.sh	application/x-sh	2.9 KB

In response to

Re: Vacuum: allow usage of more than 1GB of work mem at 2017-04-27 07:25:11 from Masahiko Sawada

Responses

Fwd: Vacuum: allow usage of more than 1GB of work mem at 2017-07-11 16:51:03 from Claudio Freire

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Greg Stark	2017-07-10 23:08:54	Re: RFC: Key normalization for nbtree
Previous Message	David Fetter	2017-07-10 22:46:02	Re: New partitioning - some feedback