Re: Rethinking autovacuum.c memory handling

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Rethinking autovacuum.c memory handling
Date: 2017-09-23 10:30:50
Message-ID: 20170923103050.decvu4b3qqnjti67@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> I notice that autovacuum.c calls autovacuum_do_vac_analyze, and
> thereby vacuum(), in TopTransactionContext. This doesn't seem
> like a terribly great idea, because it doesn't correspond to what
> happens during a manually-invoked vacuum. TopTransactionContext
> will go away when vacuum() commits the outer transaction, whereas
> in non-autovac usage, we call vacuum() in a PortalHeapMemory
> context that is not a child of TopTransactionContext and is not
> at risk of being reset multiple times during the vacuum(). This'd
> be a hazard if autovacuum_do_vac_analyze or vacuum did any palloc's
> before getting to the main loop. More generally, I'm not aware of
> other cases where we invoke a function in a context that we know
> that function will destroy as it executes.

Oh, good catch. This must be a very old oversight -- I bet it goes all
the way back to the introduction of autovacuum. I think if there were
any actual bugs, we would have noticed by now.

> I don't see any live bug associated with this in HEAD, but this behavior
> requires a rather ugly (and memory-leaking) workaround in the proposed
> patch to allow multiple vacuum target rels.

Well, it's definitely broken, so I'd vote for fixing it even if we
weren't considering that patch.

> What I think we should do instead is invoke autovacuum_do_vac_analyze
> in the PortalContext that do_autovacuum has created, which we already
> have a mechanism to reset once per table processed in do_autovacuum.

Sounds sensible.

> The attached patch does that, and also modifies perform_work_item()
> to use the same approach. Right now perform_work_item() has a
> copied-and-pasted MemoryContextResetAndDeleteChildren(PortalContext)
> call in its error recovery path, but that seems a bit out of place
> given that perform_work_item() isn't using PortalContext otherwise.

Oops :-(

> PS: I was disappointed to find out that perform_work_item() isn't
> exercised at all in the standard regression tests.

Yeah ... It's fairly simple to create a test that will invoke it a few
times, but one problem is that it depends on autovacuum's timing. If I
add this in brin.sql

-- Test BRIN work items
CREATE TABLE brin_work_items (LIKE brintest) WITH (fillfactor = 10);
CREATE INDEX brin_work_items_idx ON brin_work_items USING brin (textcol)
WITH (autosummarize = on, pages_per_range=1);
INSERT INTO brin_work_items SELECT * FROM brintest;

the work item is only performed about 15 seconds after the complete
regression tests are finished, which I fear would mean "never" in
practice.

One idea I just had is to somehow add it to src/test/modules/brin, and
set up the postmaster in that test with autovacuum_naptime=1s. I'll go
check how feasible that is. (By placing it there we could also verify
that the index does indeed contain the index entries we expect, since it
has pageinspect available.)

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-09-23 11:14:15 Re: Rethinking autovacuum.c memory handling
Previous Message Julien Rouhaud 2017-09-23 10:29:05 Re: [Proposal] Make the optimiser aware of partitions ordering