Re: Vacuum: allow usage of more than 1GB of work mem

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Vacuum: allow usage of more than 1GB of work mem
Date: 2018-04-05 20:02:09
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/04/18 17:20, Claudio Freire wrote:
> Ok, rebased patches attached

Thanks! I took a look at this.

First, now that the data structure is more complicated, I think it's
time to abstract it, and move it out of vacuumlazy.c. The Tid Map needs
to support the following operations:

* Add TIDs, in order (in 1st phase of vacuum)
* Random lookup, by TID (when scanning indexes)
* Iterate through all TIDs, in order (2nd pass over heap)

Let's add a new source file to hold the code for the tid map data
structure, with functions corresponding those operations.

I took a stab at doing that, and I think it makes vacuumlazy.c nicer.

Secondly, I'm not a big fan of the chosen data structure. I think the
only reason that the segmented "multi-array" was chosen is that each
"segment" works is similar to the simple array that we used to have.
After putting it behind the abstraction, it seems rather ad hoc. There
are many standard textbook data structures that we could use instead,
and would be easier to understand and reason about, I think.

So, I came up with the attached patch. I used a B-tree as the data
structure. Not sure it's the best one, I'm all ears for suggestions and
bikeshedding on alternatives, but I'm pretty happy with that. I would
expect it to be pretty close to the simple array with binary search in
performance characteristics. It'd be pretty straightforward to optimize
further, and e.g. use a bitmap of OffsetNumbers or some other more dense
data structure in the B-tree leaf pages, but I resisted doing that as
part of this patch.

I haven't done any performance testing of this (and not much testing in
general), but at least the abstraction seems better this way.
Performance testing would be good, too. In particular, I'd like to know
how this might affect the performance of lazy_tid_reaped(). That's a hot
spot when vacuuming indexes, so we don't want to add any cycles there.
Was there any ready-made test kits on that in this thread? I didn't see
any at a quick glance, but it's a long thread..

- Heikki

Attachment Content-Type Size
vacuum-tidmap-1.patch text/x-patch 37.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2018-04-05 20:02:20 Re: pgsql: New files for MERGE
Previous Message Andres Freund 2018-04-05 20:00:03 Re: [HACKERS] MERGE SQL Statement for PG11