Re: Speedup of relation deletes during recovery

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speedup of relation deletes during recovery
Date: 2018-04-19 02:28:02
Message-ID: 20180419022802.GH957@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 19, 2018 at 01:52:26AM +0900, Fujii Masao wrote:
> No. But after my colleague truncated more than one hundred tables on
> the server with shared_buffers = 300GB, the recovery could not finish
> even after 10 minutes since the startup of the recovery. So I had to
> shutdown the server immediately, set shared_buffers to very small
> temporarily and start the server to cause the recovery to finish soon.

Hm, OK. Here are simple functions to create and drop many relations in
a single transaction:
create or replace function create_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'create table tab_' || i::text || ' (a int);';
execute query_string;
end loop;
end;
$$ language plpgsql;

create or replace function drop_tables(numtabs int)
returns void as $$
declare query_string text;
begin
for i in 1..numtabs loop
query_string := 'drop table tab_' || i::text;
execute query_string;
end loop;
end;
$$ language plpgsql;

With a server running 8GB of shared buffers (you likely need to increase
max_locks_per_transaction) and 10k relations created and dropped, it
took 50 seconds to finish redo of roughly 4 segments:
2018-04-19 11:17:15 JST [7472]: [14-1] db=,user=,app=,client= LOG: redo
starts at 0/15DF2E8
2018-04-19 11:18:05 JST [7472]: [15-1] db=,user=,app=,client= LOG:
invalid record length at 0/4E46160: wanted 24, got 0
2018-04-19 11:18:05 JST [7472]: [16-1] db=,user=,app=,client= LOG: redo
done at 0/4A7C6E

Then with your patch it took... Barely 1 second.
2018-04-19 11:20:33 JST [11690]: [14-1] db=,user=,app=,client= LOG:
redo starts at 0/15DF2E8
2018-04-19 11:20:34 JST [11690]: [15-1] db=,user=,app=,client= LOG:
invalid record length at 0/4E299B0: wanted 24, got 0
2018-04-19 11:20:34 JST [11690]: [16-1] db=,user=,app=,client= LOG:
redo done at 0/4E29978

Looking at profiles with perf I can also see that 95% of the time is
spent in DropRelFileNodesAllBuffers which is where the startup process
spends most of its CPU. So HEAD is booh. And your patch is excellent
in this context.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2018-04-19 02:35:02 Re: VM map freeze corruption
Previous Message Michael Paquier 2018-04-19 01:53:58 Re: Adding an LWLockHeldByMe()-like function that reports if any buffer content lock is held