logical decoding : exceeded maxAllocatedDescs for .spill files

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: logical decoding : exceeded maxAllocatedDescs for .spill files
Date: 2019-09-11 10:44:18
Message-ID: CAJ3gD9c-sECEn79zXw4yBnBdOttacoE-6gAyP0oy60nfs_sabQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I reproduced the error "exceeded maxAllocatedDescs (492) while trying
to open file ...", which was also discussed about in the thread [1].
This issue is similar but not exactly the same as [1]. In [1], the
file for which this error used to show up was
"pg_logical/mappings/map...." , while here it's the .spill file. And
here the issue , in short, seems to be : The .spill file does not get
closed there and then, unlike in [1] where there was a file descriptor
leak.

I could reproduce it using a transaction containing a long series of
sub-transactions (possibly could be reproduced without
sub-transactions, but looking at the code I could come up with this
script using sub-transactions easily) :

create table tab(id int);

-- Function that does huge changes in a single transaction
create or replace function f(id int) returns int as
$$
begin
-- Iterate this more than 492 times (max transient file
descriptors PG would typically allow)
-- This will create that many sub-transactions due to presence of
exception block.
FOR i IN 1..600 LOOP

BEGIN
-- Iterate more than 4096 times (so that changes spill to
disk: max_changes_in_memory)
FOR j IN 1..5000 LOOP
insert into tab values (1);
END LOOP;
EXCEPTION
when division_by_zero then perform 'dummy';
END;

END LOOP;

return id;
end $$ language plpgsql;

SELECT * FROM pg_create_logical_replication_slot('logical', 'test_decoding');

begin;
select f(1); -- Do huge changes in a single transaction
commit;

\! pg_recvlogical -d postgres --slot=logical --verbose --start -f -

pg_recvlogical: starting log streaming at 0/0 (slot logical)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot logical)
BEGIN 1869
pg_recvlogical: confirming write up to 0/1B6D6E38, flush to 0/1B6D6E38
(slot logical)
pg_recvlogical: error: unexpected termination of replication stream:
ERROR: exceeded maxAllocatedDescs (492) while trying to open file
"pg_replslot/logical/xid-2362-lsn-0-24000000.spill"
pg_recvlogical: disconnected; waiting 5 seconds to try again

Looking at the code, what might be happening is,
ReorderBufferIterTXNInit()=>ReorderBufferRestoreChanges() opens the
files, but leaves them open if end of file is not reached. Eventually
if end of file is reached, it gets closed. The function returns back
without closing the file descriptor if reorder buffer changes being
restored are more than max_changes_in_memory. Probably later on, the
rest of the changes get restored in another
ReorderBufferRestoreChanges() call. But meanwhile, if there are a lot
of such files opened, we can run out of the max files that PG decides
to keep open (it has some logic that takes into account system files
allowed to be open, and already opened).

Offhand, what I am thinking is, we need to close the file descriptor
before returning from ReorderBufferRestoreChanges(), and keep track of
the file offset and file path, so that next time we can resume reading
from there.

Comments ?

[1] https://www.postgresql.org/message-id/flat/738a590a-2ce5-9394-2bef-7b1caad89b37%402ndquadrant.com

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-09-11 12:24:30 Re: Change atoi to strtol in same place
Previous Message Pavel Stehule 2019-09-11 10:05:36 Re: doc: update PL/pgSQL sample loop function