Re: Vacuum: allow usage of more than 1GB of work mem

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2016-09-13 18:51:59
Message-ID: CA+TgmoaYP32nX_ejVP8dfspTwd7PFt-n7Pnqpwdwn_cdogS-aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 9, 2016 at 3:04 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Attached PoC patch changes the representation of dead tuple locations
> to the hashmap having tuple bitmap.
> The one hashmap entry consists of the block number and the TID bitmap
> of corresponding block, and the block number is the hash key of
> hashmap.
> Current implementation of this patch is not smart yet because each
> hashmap entry allocates the tuple bitmap with fixed
> size(LAZY_ALLOC_TUPLES), so each hashentry can store up to
> LAZY_ALLOC_TUPLES(291 if block size is 8kB) tuples.
> In case where one block can store only the several tens tuples, the
> most bits are would be waste.
>
> After improved this patch as you suggested, I will measure performance benefit.

We also need to consider the amount of memory gets used. What I
proposed - replacing the array with an array of arrays - would not
increase memory utilization significantly. I don't think it would
have much impact on CPU utilization either. It would require
replacing the call to bsearch() in lazy_heap_reaptid() with an
open-coded implementation of bsearch, or with one bsearch to find the
chunk and another to find the TID within the chunk, but that shouldn't
be very expensive. For one thing, if the array chunks are around the
size I proposed (64MB), you've got more than ten million tuples per
chunk, so you can't have very many chunks unless your table is both
really large and possessed of quite a bit of dead stuff.

Now, if I'm reading it correctly, this patch allocates a 132-byte
structure for every page with at least one dead tuple. In the worst
case where there's just one dead tuple per page, that's a 20x
regression in memory usage. Actually, it's more like 40x, because
AllocSetAlloc rounds small allocation sizes up to the next-higher
power of two, which really stings for a 132-byte allocation, and then
adds a 16-byte header to each chunk. But even 20x is clearly not
good. There are going to be lots of real-world cases where this uses
way more memory to track the same number of dead tuples, and I'm
guessing that benchmarking is going to reveal that it's not faster,
either.

I think it's probably wrong to worry that an array-of-arrays is going
to be meaningfully slower than a single array here. It's basically
costing you some small number of additional memory references per
tuple, which I suspect isn't all that relevant for a bulk operation
that does I/O, writes WAL, locks buffers, etc. But if it is relevant,
then I think there are other ways to buy that performance back which
are likely to be more memory efficient than converting this to use a
hash table. For example, we could keep a bitmap with one bit per K
pages. If the bit is set, there is at least 1 dead tuple on that
page; if clear, there are none. When we see an index tuple, we
consult the bitmap to determine whether we need to search the TID
list. We select K to be the smallest power of 2 such that the bitmap
uses less memory than some threshold, perhaps 64kB. Assuming that
updates and deletes to the table have some locality, we should be able
to skip a large percentage of the TID searches with a probe into this
very compact bitmap. Note that we can set K = 1 for tables up to 4GB
in size, and even a 1TB table only needs K = 256. Odds are very good
that a 1TB table being vacuumed has many 256-page ranges containing no
dead tuples at all ... and if this proves to be false and the dead
tuples are scattered uniformly throughout the table, then you should
probably be more worried about the fact that you're dumping a bare
minimum of 4GB of random I/O on your hapless disk controller than
about how efficient the TID search is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-09-13 18:53:42 Re: kqueue
Previous Message Tom Lane 2016-09-13 18:47:08 Re: kqueue