Re: Vacuum: allow usage of more than 1GB of work mem

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2016-09-15 16:16:52
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Thu, Sep 15, 2016 at 12:50 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 09/14/2016 07:57 PM, Tom Lane wrote:
>> Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
>>> On Wed, Sep 14, 2016 at 10:53 PM, Alvaro Herrera
>>> <alvherre(at)2ndquadrant(dot)com>
>>> wrote:
>>>> One thing not quite clear to me is how do we create the bitmap
>>>> representation starting from the array representation in midflight
>>>> without using twice as much memory transiently. Are we going to write
>>>> the array to a temp file, free the array memory, then fill the bitmap by
>>>> reading the array from disk?
>>> We could do that.
>> People who are vacuuming because they are out of disk space will be very
>> very unhappy with that solution.
> The people are usually running out of space for data, while these files
> would be temporary files placed wherever temp_tablespaces points to. I'd
> argue if this is a source of problems, the people are already in deep
> trouble due to sorts, CREATE INDEX, ... as those commands may also generate
> a lot of temporary files.

One would not expect "CREATE INDEX" to succeed when space is tight,
but VACUUM is quite the opposite.

Still, temporary storage could be used if available, and gracefully
fall back to some other technique (like not using bitmaps) when not.

Not sure it's worth the trouble, though.

On Wed, Sep 14, 2016 at 12:24 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Wed, Sep 14, 2016 at 12:17 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I am kind of doubtful about this whole line of investigation because
>> we're basically trying pretty hard to fix something that I'm not sure
>> is broken. I do agree that, all other things being equal, the TID
>> lookups will probably be faster with a bitmap than with a binary
>> search, but maybe not if the table is large and the number of dead
>> TIDs is small, because cache efficiency is pretty important. But even
>> if it's always faster, does TID lookup speed even really matter to
>> overall VACUUM performance? Claudio's early results suggest that it
>> might, but maybe that's just a question of some optimization that
>> hasn't been done yet.
> FYI, the reported impact was on CPU time, not runtime. There was no
> significant difference in runtime (real time), because my test is
> heavily I/O bound.
> I tested with a few small tables and there was no significant
> difference either, but small tables don't stress the array lookup
> anyway so that's expected.
> But on the assumption that some systems may be CPU bound during vacuum
> (particularly those able to do more than 300-400MB/s sequential I/O),
> in those cases the increased or decreased cost of lazy_tid_reaped will
> directly correlate to runtime. It's just none of my systems, which all
> run on amazon and is heavily bandwidth constrained (fastest I/O
> subsystem I can get my hands on does 200MB/s).

Attached is the patch with the multiarray version.

The tests are weird. Best case comparison over several runs, to remove
the impact of concurrent activity on this host (I couldn't remove all
background activity even when running the tests overnight, the distro
adds tons of crons and background cleanup tasks it would seem),
there's only very mild CPU impact. I'd say insignificant, as it's well
below the mean variance.

Worst case:

DETAIL: CPU 9.90s/80.94u sec elapsed 1232.42 sec.

Best case:

DETAIL: CPU 12.10s/63.82u sec elapsed 832.79 sec.

There seems to be more variance with the multiarray approach than the
single array one, but I could not figure out why. Even I/O seems less

Worst case:

INFO: "pgbench_accounts": removed 400000000 row versions in 6557382 pages
DETAIL: CPU 64.31s/37.60u sec elapsed 2573.88 sec.

Best case:

INFO: "pgbench_accounts": removed 400000000 row versions in 6557378 pages
DETAIL: CPU 54.48s/31.78u sec elapsed 1552.18 sec.

Since this test takes several hours to complete, I could only run a
few runs of each version, so the statistical significance of the test
isn't very bright.

I'll try to compare with smaller pgbench scale numbers and more runs
over the weekend (gotta script that). It's certainly puzzling, I
cannot explain the increased variance, especially in I/O, since the
I/O should be exactly the same. I'm betting it's my system that's
unpredictable somehow. So I'm posting the patch in case someone gets
inspired and can spot the reason, and because there's been a lot of
talk about this very same approach, so I thought I'd better post the
code ;)

I'll also try to get a more predictable system to run the tests on.

Attachment Content-Type Size
0001-Vacuum-allow-using-more-than-1GB-work-mem-v2-multiarray.patch text/x-patch 18.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-09-15 16:22:16 Re: Vacuum: allow usage of more than 1GB of work mem
Previous Message Tom Lane 2016-09-15 16:09:16 Re: File system operations.