Re: Vacuum: allow usage of more than 1GB of work mem

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2016-11-17 17:34:50
Message-ID: CAD21AoBh8Ygfi3qryQu3mCxj4Z+xCn=uWv7Xx-BSYb0OhvH+LQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 27, 2016 at 5:25 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Thu, Sep 15, 2016 at 1:16 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> On Wed, Sep 14, 2016 at 12:24 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>>> On Wed, Sep 14, 2016 at 12:17 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>>
>>>> I am kind of doubtful about this whole line of investigation because
>>>> we're basically trying pretty hard to fix something that I'm not sure
>>>> is broken. I do agree that, all other things being equal, the TID
>>>> lookups will probably be faster with a bitmap than with a binary
>>>> search, but maybe not if the table is large and the number of dead
>>>> TIDs is small, because cache efficiency is pretty important. But even
>>>> if it's always faster, does TID lookup speed even really matter to
>>>> overall VACUUM performance? Claudio's early results suggest that it
>>>> might, but maybe that's just a question of some optimization that
>>>> hasn't been done yet.
>>>
>>> FYI, the reported impact was on CPU time, not runtime. There was no
>>> significant difference in runtime (real time), because my test is
>>> heavily I/O bound.
>>>
>>> I tested with a few small tables and there was no significant
>>> difference either, but small tables don't stress the array lookup
>>> anyway so that's expected.
>>>
>>> But on the assumption that some systems may be CPU bound during vacuum
>>> (particularly those able to do more than 300-400MB/s sequential I/O),
>>> in those cases the increased or decreased cost of lazy_tid_reaped will
>>> directly correlate to runtime. It's just none of my systems, which all
>>> run on amazon and is heavily bandwidth constrained (fastest I/O
>>> subsystem I can get my hands on does 200MB/s).
>>
>> Attached is the patch with the multiarray version.
>>
>> The tests are weird. Best case comparison over several runs, to remove
>> the impact of concurrent activity on this host (I couldn't remove all
>> background activity even when running the tests overnight, the distro
>> adds tons of crons and background cleanup tasks it would seem),
>> there's only very mild CPU impact. I'd say insignificant, as it's well
>> below the mean variance.
>
> I reran the tests on a really dedicated system, and with a script that
> captured a lot more details about the runs.
>
> The system isn't impressive, an i5 with a single consumer HD and 8GB
> RAM, but it did the job.
>
> These tests make more sense, so I bet it was the previous tests that
> were spoiled by concurrent activity on the host.
>
> Attached is the raw output of the test, the script used to create it,
> and just in case the patch set used. I believe it's the same as the
> last one I posted, just rebased.

I glanced at the patches but the both patches don't obey the coding
style of PostgreSQL.
Please refer to [1].

[1] http://wiki.postgresql.org/wiki/Developer_FAQ#What.27s_the_formatting_style_used_in_PostgreSQL_source_code.3F.

>
> In the results archive, the .vacuum prefix is the patched version with
> both patch 1 and 2, .git.ref is just patch 1 (without which the
> truncate takes unholily long).

Did you measure the performance benefit of 0001 patch by comparing
HEAD with HEAD+0001 patch?

> Grepping the results a bit, picking an average run out of all runs on
> each scale:
>
> Timings:
>
> Patched:
>
> s100: CPU: user: 3.21 s, system: 1.54 s, elapsed: 18.95 s.
> s400: CPU: user: 14.03 s, system: 6.35 s, elapsed: 107.71 s.
> s4000: CPU: user: 228.17 s, system: 108.33 s, elapsed: 3017.30 s.
>
> Unpatched:
>
> s100: CPU: user: 3.39 s, system: 1.64 s, elapsed: 18.67 s.
> s400: CPU: user: 15.39 s, system: 7.03 s, elapsed: 114.91 s.
> s4000: CPU: user: 282.21 s, system: 105.95 s, elapsed: 3017.28 s.
>
> Total I/O (in MB)
>
> Patched:
>
> s100: R:2.4 - W:5862
> s400: R:1337.4 - W:29385.6
> s4000: R:318631 - W:370154
>
> Unpatched:
>
> s100: R:1412.4 - W:7644.6
> s400: R:3180.6 - W:36281.4
> s4000: R:330683 - W:370391
>
>
> So, in essence, CPU time didn't get adversely affected. If anything,
> it got improved by about 20% on the biggest case (scale 4000).

And this test case deletes all tuples in relation and then measure
duration of vacuum.
It would not be effect much in practical use case.

> While total runtime didn't change much, I believe this is only due to
> the fact that the index is perfectly correlated (clustered?) since
> it's a pristine index, so index scans either remove or skip full
> pages, never leaving things half-way. A bloated index would probably
> show a substantially different behavior, I'll try to get a script that
> does it by running pgbench a while before the vacuum or something like
> that.
>
> However, the total I/O metric already shows remarkable improvement.
> This metric is measuring all the I/O including pgbench init, the
> initial vacuum pgbench init does, the delete and the final vacuum. So
> it's not just the I/O for the vacuum itself, but the whole run. We can
> see the patched version reading a lot less (less scans over the
> indexes will do that), and in some cases writing less too (again, less
> index scans may be performing less redundant writes when cleaning
> up/reclaiming index pages).

What value of maintenance_work_mem did you use for this test? Since
DeadTuplesSegment struct still stores array of ItemPointerData(6byte)
representing dead tuple I supposed that the frequency of index vacuum
does not change. But according to the test result, a index vacuum is
invoked once and removes 400000000 rows at the time. It means that the
vacuum stored about 2289 MB memory during heap vacuum. On the other
side, the result of test without 0002 patch show that a index vacuum
remove 178956737 rows at the time, which means 1GB memory was used.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2016-11-17 17:51:27 Re: Vacuum: allow usage of more than 1GB of work mem
Previous Message Robert Haas 2016-11-17 17:24:17 Re: Hash Indexes