Re: Vacuum: allow usage of more than 1GB of work mem

From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2016-09-14 15:59:16
Message-ID: CAO_YK0UXz=SVb5wt0emO0gp=g2Ef-pXigPdO=kveYb+AVqFo9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sep 14, 2016 5:18 PM, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Sep 14, 2016 at 8:16 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> > Ah, thanks. So MaxHeapTuplesPerPage sets the upper boundary for the per
page
> > bitmap size. Thats about 36 bytes for 8K page. IOW if on an average
there
> > are 6 or more dead tuples per page, bitmap will outperform the current
> > representation, assuming max allocation for bitmap. If we can use
additional
> > estimates to restrict the size to somewhat more conservative value and
then
> > keep overflow area, then probably the break-even happens even earlier
than
> > that. I hope this gives us a good starting point, but let me know if you
> > think it's still a wrong approach to pursue.
>
> Well, it's certainly a bigger change. I think the big concern is that
> the amount of memory now becomes fixed based on the table size. So
> one problem is that you have to figure out what you're going to do if
> the bitmap doesn't fit in maintenance_work_mem. A related problem is
> that it might fit but use more memory than before, which could cause
> problems for some people. Now on the other hand it could also use
> less memory for some people, and that would be good.
>
> I am kind of doubtful about this whole line of investigation because
> we're basically trying pretty hard to fix something that I'm not sure
> is broken. I do agree that, all other things being equal, the TID
> lookups will probably be faster with a bitmap than with a binary
> search, but maybe not if the table is large and the number of dead
> TIDs is small, because cache efficiency is pretty important. But even
> if it's always faster, does TID lookup speed even really matter to
> overall VACUUM performance? Claudio's early results suggest that it
> might, but maybe that's just a question of some optimization that
> hasn't been done yet.
>
> I'm fairly sure that our number one priority should be to minimize the
> number of cases where we need to do multiple scans of the indexes to
> stay within maintenance_work_mem. If we're satisfied we've met that
> goal, then within that we should try to make VACUUM as fast as
> possible with as little memory usage as possible. I'm not 100% sure I
> know how to get there, or how much work it's worth expending. In
> theory we could even start with the list of TIDs and switch to the
> bitmap if the TID list becomes larger than the bitmap would have been,
> but I don't know if it's worth the effort.
>
> /me thinks a bit.
>
> Actually, I think that probably *is* worthwhile, specifically because
> it might let us avoid multiple index scans in cases where we currently
> require them. Right now, our default maintenance_work_mem value is
> 64MB, which is enough to hold a little over ten million tuples. It's
> also large enough to hold a bitmap for a 14GB table. So right now if
> you deleted, say, 100 tuples per page you would end up with an index
> vacuum cycles for every ~100,000 pages = 800MB, whereas switching to
> the bitmap representation for such cases would require only one index
> vacuum cycle for every 14GB, more than an order of magnitude
> improvement!
>
> On the other hand, if we switch to the bitmap as the ONLY possible
> representation, we will lose badly when there are scattered updates -
> e.g. 1 deleted tuple every 10 pages. So it seems like we probably
> want to have both options. One tricky part is figuring out how we
> switch between them when memory gets tight; we have to avoid bursting
> above our memory limit while making the switch. And even if our
> memory limit is very high, we want to avoid using memory gratuitously;
> I think we should try to grow memory usage incrementally with either
> representation.
>
> For instance, one idea to grow memory usage incrementally would be to
> store dead tuple information separately for each 1GB segment of the
> relation. So we have an array of dead-tuple-representation objects,
> one for every 1GB of the relation. If there are no dead tuples in a
> given 1GB segment, then this pointer can just be NULL. Otherwise, it
> can point to either the bitmap representation (which will take ~4.5MB)
> or it can point to an array of TIDs (which will take 6 bytes/TID).
> That could handle an awfully wide variety of usage patterns
> efficiently; it's basically never worse than what we're doing today,
> and when the dead tuple density is high for any portion of the
> relation it's a lot better.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

I'd say it's an idea worth pursuing. It's the base idea behind roaring
bitmaps, arguably the best overall compressed bitmap implementation.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2016-09-14 16:04:55 Re: Speed up Clog Access by increasing CLOG buffers
Previous Message Tom Lane 2016-09-14 15:54:45 Re: [BUGS] BUG #14244: wrong suffix for pg_size_pretty()