Quick Links

Re: Vacuum: allow usage of more than 1GB of work mem

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Vacuum: allow usage of more than 1GB of work mem
Date:	2018-04-05 20:02:09
Message-ID:	f8d1dd03-c632-86fa-15cf-6f5633350b78@iki.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 03/04/18 17:20, Claudio Freire wrote:
> Ok, rebased patches attached

Thanks! I took a look at this.

First, now that the data structure is more complicated, I think it's
time to abstract it, and move it out of vacuumlazy.c. The Tid Map needs
to support the following operations:

* Add TIDs, in order (in 1st phase of vacuum)
* Random lookup, by TID (when scanning indexes)
* Iterate through all TIDs, in order (2nd pass over heap)

Let's add a new source file to hold the code for the tid map data
structure, with functions corresponding those operations.

I took a stab at doing that, and I think it makes vacuumlazy.c nicer.

Secondly, I'm not a big fan of the chosen data structure. I think the
only reason that the segmented "multi-array" was chosen is that each
"segment" works is similar to the simple array that we used to have.
After putting it behind the abstraction, it seems rather ad hoc. There
are many standard textbook data structures that we could use instead,
and would be easier to understand and reason about, I think.

So, I came up with the attached patch. I used a B-tree as the data
structure. Not sure it's the best one, I'm all ears for suggestions and
bikeshedding on alternatives, but I'm pretty happy with that. I would
expect it to be pretty close to the simple array with binary search in
performance characteristics. It'd be pretty straightforward to optimize
further, and e.g. use a bitmap of OffsetNumbers or some other more dense
data structure in the B-tree leaf pages, but I resisted doing that as
part of this patch.

I haven't done any performance testing of this (and not much testing in
general), but at least the abstraction seems better this way.
Performance testing would be good, too. In particular, I'd like to know
how this might affect the performance of lazy_tid_reaped(). That's a hot
spot when vacuuming indexes, so we don't want to add any cycles there.
Was there any ready-made test kits on that in this thread? I didn't see
any at a quick glance, but it's a long thread..

- Heikki

Attachment	Content-Type	Size
vacuum-tidmap-1.patch	text/x-patch	37.9 KB

In response to

Re: Vacuum: allow usage of more than 1GB of work mem at 2018-04-03 14:20:11 from Claudio Freire

Responses

Re: Vacuum: allow usage of more than 1GB of work mem at 2018-04-05 22:59:59 from Claudio Freire

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2018-04-05 20:02:20	Re: pgsql: New files for MERGE
Previous Message	Andres Freund	2018-04-05 20:00:03	Re: [HACKERS] MERGE SQL Statement for PG11