Re: Vacuum: allow usage of more than 1GB of work mem

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2016-09-15 16:04:25
Message-ID: b8215fc1-d522-ebdf-3808-f8b9e5a42f4a@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/14/2016 05:17 PM, Robert Haas wrote:
> I am kind of doubtful about this whole line of investigation because
> we're basically trying pretty hard to fix something that I'm not sure
> is broken. I do agree that, all other things being equal, the TID
> lookups will probably be faster with a bitmap than with a binary
> search, but maybe not if the table is large and the number of dead
> TIDs is small, because cache efficiency is pretty important. But even
> if it's always faster, does TID lookup speed even really matter to
> overall VACUUM performance? Claudio's early results suggest that it
> might, but maybe that's just a question of some optimization that
> hasn't been done yet.

Regarding the lookup performance, I don't think the bitmap alone can
significantly improve that - it's more efficient memory-wise, no doubt
about that, but it's still likely larger than CPU caches and accessed
mostly randomly (when vacuuming the indexes).

IMHO the best way to speed-up lookups (if it's really an issue, haven't
done any benchmarks) would be to build a small bloom filter in front of
the TID array / bitmap. It shall be fairly small (depending on the
number of TIDs, error rate etc.) and likely to fit into L2/L3, and
eliminate a lot of probes into the much larger array/bitmap.

Of course, it's another layer of complexity - the good thing is we don't
need to build the filter until after we collect the TIDs, so we got
pretty good inputs for the bloom filter parameters.

But all this is based on the assumption that the lookups are actually
expensive, not sure about that.

regards
Tomas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christian Convey 2016-09-15 16:05:30 Tackling JsonPath support
Previous Message Tomas Vondra 2016-09-15 15:50:28 Re: Vacuum: allow usage of more than 1GB of work mem