Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: skoposov(at)ed(dot)ac(dot)uk, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16722: PG hanging on COPY when table has close to 2^32 toasts in the table.
Date: 2020-11-18 10:57:05
Message-ID: CABUevExw6AHqC_U7nH83=rvF-xY_U8AUDbR94kBDqDR0hGqKXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Nov 18, 2020 at 2:18 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2020/11/17 3:15, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 16722
> > Logged by: Sergey Koposov
> > Email address: skoposov(at)ed(dot)ac(dot)uk
> > PostgreSQL version: 11.9
> > Operating system: debian
> > Description:
> >
> > Hi,
> >
> > When ingesting a billion or so rows in the table that has some array columns
> > (see schema below), at some point all the backends doing the ingestion hang
> > with 100% CPU usage. When investigating, I think I can trace this to the
> > limit of 2^32 toasted records per table.
>
> Yes, this is the limitation of the number of out-of-line values in toast.
> https://wiki.postgresql.org/wiki/TOAST#Total_table_size_limit
>
>
> > See the gdb full backtrace of
> > hanged backends in the bottom. When the problem occurs, it has 272 mill
> > records.
> > Basically the hanging happens in GetNewOidWithIndex called by
> > toast_save_datum.
> > While I understand the limit for the toast number is there to stay, but the
> > behaviour of PG next to the limit is IMO a bug (or at least non-trivial) .
> >
> > I would rather prefer to see an error-message as opposed to backends
> > hanging.
>
> To emit an error, we need to check that there is no unused OID for
> the toast and it would take very long to do that. So I'm not sure
> if to emit an error message really improves the current situation or not.
>
> OTOH it might be good idea to emit a warning message (every time we
> cannot find unused OID in recent 1 million OIDs, for example) or report
> the issue as wait event, or something while the record insertion is hanging
> because of toast limit, so that we can easily detect the issue.

This definitely sounds like something that's worth putting out as a
wait event. Even before you start traversing millions of OIDs it might
gradually start to show up, and being able to monitor that would
definitely be useful.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jonathan Tauman 2020-11-18 12:34:18 Issues with pg_dump command
Previous Message Amit Kapila 2020-11-18 08:43:13 Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop