Re: VACUUM memory management

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM memory management
Date: 2020-04-05 20:26:25
Message-ID: 20200405202625.GC2228@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 11, 2019 at 09:29:17PM +0500, Ibrar Ahmed wrote:
> > Did you modify Claudio's patch or write a totally new one?
>
> I wrote completely new patch. I tried multiple techniques like using a list
> instead of fixed size array which I thought was most suitable here, but
> leave that because of conflict with Parallel Vacuum.

Using a list will hardly work, or certainly not well, since it needs to be
searched by the ambulkdelete callback.

> >> If you wrote a totally new one, have you compared your work with
> >> Claudio's, to see if he covered
> >> anything you might need to cover?
> >
> > I checked the patch, and it does not do anything special which my patch is
> not doing except one thing. The patch is claiming to increase the limit of
> 1GB along with that, but I have not touched that. In my case, we are still
> under the limit of maintaines_work_mem but allocate memory in chunks. In
> that case, you have the leverage to set a big value of maintaness_work_mem
> (even if you don't need that) because it will not allocate all the memory
> at the start.

After spending a bunch of time comparing them, I disagree. Claudio's patch
does these:

- avoid using multiple chunks if there's no indexes, therefore no need to
avoid the high cost of index scans to avoid;
- rather than doing an index scan for each chunk (bad), the callback function
lazy_tid_reaped() does a custom binary search *over* chunks of different
sizes and then *within* each chunk. That's maybe slighly over-engineered,
I'm not convinced that's needed (but I thought it was pretty clever), but
someone thought that was important.
- properly keep track of *total* number of dead tuples, eg for progress
reporting, and for prev_dead_count for pages with no dead tuples;
- lazy_record_dead_tuple() doubles allocation when running out of space for
dead tuples; some people disagree with that (myself included) but I'm
including it here since that's what it does. This still seems nontrivial
(to me) to adapt to work with parallel query.

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-04-05 20:44:04 Re: CLOBBER_CACHE_ALWAYS regression instability
Previous Message Tom Lane 2020-04-05 20:12:26 Re: backup manifests and contemporaneous buildfarm failures