Temporary tables prevent autovacuum, leading to XID wraparound

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Temporary tables prevent autovacuum, leading to XID wraparound
Date: 2018-01-25 06:14:41
Message-ID: 0A3221C70F24FB45833433255569204D1F8A4DC6@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I've found a problem that an orphaned temporary table could cause XID wraparound. Our customer encountered this problem with PG 9.5.2, but I think this will happen with the latest PG.

I'm willing to fix this, but I'd like to ask you what approach we should take.

PROBLEM
====================================

The customer has a database for application data, which I call it user_db here. They don't store application data in postgres database.

No tables in user_db was autovacuumed for more than a month, leading to user tables bloating. Those tables are eligible for autovacuum according to pg_stat_all_tables and autovacuum settings.

age(datfrozenxid) of user_db and postgres databases are greater than autovacuum_max_freeze_age, so they are eligible for autovacuuming for XID wraparound.

There are user tables in user_db whose age(relfrozenxid) is greater than autovacuum_freeze_max_age, so those tables should get autovacuum treatment.

CAUSE
====================================

postgres database has a table named pg_temp_3.fetchchunks, whose age(relfrozenxid) is greater than autovacuum_freeze_max_age. This temporary table is the culprit. pg_temp_3.fetchchunks is created by pg_rewind. The customer says they ran pg_rewind.

autovacuum launcher always choose postgres, because do_start_worker() scans pg_database and finds that postgres database needs vacuuming for XID wraparound. user_db is never chosen for vacuuming, although it also needs vacuuming for XID wraparound.

autovacuum worker doesn't delete pg_temp3.fetchchunks, because the backendid 3 is used by some application so autovacuum worker thinks that the backend is active and the temporary table cannot be dropped.

I don't know why pg_temp3.fetchchunks still exists. Maybe the user ran pg_ctl stop -mi while pg_rewind was running.

FIX
====================================

I have the following questions. Along which line should I proceed to fix the problem?

* Why does autovacuum launcher always choose only one database when that database need vacuuming for XID wraparound? Shouldn't it also choose other databases?

* I think temporary tables should not require vacuuming for XID wraparound. Furtherover, should updates/deletes to temporary tables be in-place instead of creating garbage, so that any form of vacuum is unnecessary? Other sessions do not need to read temporary tables.

* In this incident, autovacuum worker misjudged that pg_temp_3.fetchchunks can't be deleted, although the creator (pg_rewind) is no longer active. How can we delete orphan temporary tables safely?

Regards
Takayuki Tsunakawa

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2018-01-25 06:16:04 Re: Rangejoin rebased
Previous Message Masahiko Sawada 2018-01-25 05:31:21 Re: PATCH: Exclude unlogged tables from base backups