Re: logical decoding : exceeded maxAllocatedDescs for .spill files

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera from 2ndQuadrant <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Date: 2019-11-19 08:37:46
Message-ID: CAA4eK1+cey6i6a0zD9kk_eaDXb4RPNZqu4UwXO9LbHAgMpMBkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 18, 2019 at 5:50 PM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>
> On Mon, 18 Nov 2019 at 17:20, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > I see that you have made changes in ReorderBufferRestoreChanges to use
> > PathNameOpenFile, but not in ReorderBufferSerializeTXN. Is there a
> > reason for the same? In my test environment, with the test provided
> > by you, I got the error (reported in this thread) via
> > ReorderBufferSerializeTXN.
>
> You didn't get this error with the patch applied, did you ?
>

No, I got this before applying the patch. However, after applying the
patch, I got below error in the same test:

postgres=# SELECT 1 from
pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
ERROR: could not read from reorderbuffer spill file: Invalid argument

It seems to me that FileRead API used in the patch can return value <
0 on EOF. See the API usage in BufFileLoadBuffer. I got this error
on a windows machine and in the server log the message was "LOG:
unrecognized win32 error code: 38" which indicates "Reached the end of
the file."

> If you were debugging this without the patch applied, I suspect that
> the reason why ReorderBufferSerializeTXN() => OpenTransientFile() is
> generating this error is because the max limit must be already crossed
> because of earlier calls to ReorderBufferRestoreChanges().
>
> Note that in ReorderBufferSerializeTXN(), OpenTransientFile() is
> sufficient because the code in that function has made sure the fd gets
> closed there itself.
>

Okay, then we might not need it there, but we should at least add a
comment in ReorderBufferRestoreChanges to explain why we have used a
different function to operate on the file at that place.

>
> For the API's that use VFDs (like PathNameOpenFile), the files opened
> are always recorded in the VfdCache array. So it is not required to do
> the cleanup at (sub)transaction end, because the kernel fds get closed
> dynamically in ReleaseLruFiles() whenever they reach max_safe_fds
> limit. So if a transaction aborts, the fds might remain open, but
> those will get cleaned up whenever we require more fds, through
> ReleaseLruFiles(). Whereas, for files opened through
> OpenTransientFile(), VfdCache is not involved, so this needs
> transaction end cleanup.
>

Have you tried by injecting some error? After getting the error
mentioned above in email, when I retried the same query, I got the
below message.

postgres=# SELECT 1 from
pg_logical_slot_get_changes('regression_slot', NULL,NULL) LIMIT 1;
ERROR: could not remove file
"pg_replslot/regression_slot/xid-1693-lsn-0-18000000.spill" during
removal of pg_replslot/regression_slot/xid*: Permission denied

And, then I tried to drop the replication slot and I got below error.
postgres=# SELECT * FROM pg_drop_replication_slot('regression_slot');
ERROR: could not rename file "pg_replslot/regression_slot" to
"pg_replslot/regression_slot.tmp": Permission denied

It might be something related to Windows, but you can once try by
injecting some error after reading a few files in the code path and
see the behavior.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2019-11-19 09:16:47 Re: range_agg
Previous Message Thomas Munro 2019-11-19 07:46:21 Re: Planner chose a much slower plan in hashjoin, using a large table as the inner table.