Re: Fun fact about autovacuum and orphan temp tables

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fun fact about autovacuum and orphan temp tables
Date: 2016-11-19 05:16:00
Message-ID: CAB7nPqSbYT6dRwsXVgiKmBdL_ARemfDZMPA+RPeC_ge0GK70hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 18, 2016 at 1:11 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> So now I think that we probably need to make this logic a bit smarter.
> Add all of the OIDs that need to be dropped to a list. Then have a
> loop prior to the main loop (where it says "Perform operations on
> collected tables.") which iterates over that list and drops those
> tables one by one, starting a transaction every (say) 100 tables or
> after an error. For bonus points, if a transaction fails, put all of
> the OIDs except the one that provoked the failure back into the list
> of OIDs to be dropped, so that we still make a progress even if some
> DROPs are failing for some reason.

Okay.

> That might sound adding unnecessary work just for the sake of
> paranoia, but I don't think it is. Failures here won't be common, but
> since they are entirely automated there will be no human intelligence
> available to straighten things out. Barring considerable caution,
> we'll just enter a flaming death spiral.

Thinking more paranoid, an extra way to enter in this flaming death
spiral is to not limit the maximum number of failures authorized when
dropping a set of orphaned tables and transactions fail multiple
times. This is basically not important as the relation on which the
drop failed gets dropped from the list but failing on each one of them
is a good way to slow down autovacuum, so putting a limit of say 10
transactions failing is I think really important.

I have played with what you suggested, and finished with the patch
attached. I have run some tests using this function to create some
temp tables with several backends to be sure that multiple backend IDs
are used:

CREATE FUNCTION create_temp_tables(i int) RETURNS void
AS $$
BEGIN
FOR i IN 1..i LOOP
EXECUTE 'CREATE TEMP TABLE aa' || i || ' (a int);';
END LOOP;
END
$$ LANGUAGE plpgsql;

Then I killed the instance. At restart I could see a bunch of temp
tables in pg_class, and I let autovacuum do the cleanup after restart.
I have tested as well the error code path in the PG_TRY() block by
enforcing manually a elog(ERROR) to be sure that the maximum number of
failures is correctly handled, better safe than sorry.
--
Michael

Attachment Content-Type Size
autovacuum-orphan-cleanup-v3.patch application/x-download 6.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-11-19 05:26:45 Re: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
Previous Message Michael Paquier 2016-11-19 04:35:43 Re: Fix checkpoint skip logic on idle systems by tracking LSN progress