Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy
Date: 2017-10-18 19:54:54
Message-ID: 20171018195454.GI17895@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi,

I just ran into this again in another context (see original dicussion, quoted
below).

Some time ago, while initially introducting non-default stats target, I set our
non-filtering columns to "STATISTICS 10", lower than default, to minimize the
size of pg_statistic, which (at least at one point) I perceived to have become
bloated and causing issue (partially due to having an excessive number of
"daily" granularity partitions, a problem I've since mitigated).

The large number of columns with non-default stats target was (I think) causing
pg_dump --section=pre-data to take 10+ minutes, which makes pg_upgrade more
disruptive than necessary, so now I'm going back and fixing it.

[pryzbyj(at)database ~]$ time sed '/SET STATISTICS 10;$/!d; s//SET STATISTICS -1;/' /srv/cdrperfbackup/ts/2017-10-17/pg_dump-section\=pre-data |psql -1q ts
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

[pryzbyj(at)database ~]$ dmesg |tail -n2
Out of memory: Kill process 6725 (postmaster) score 550 or sacrifice child
Killed process 6725, UID 26, (postmaster) total-vm:13544792kB, anon-rss:8977764kB, file-rss:8kB

So I'm hoping to encourage someone to commit the change contemplated earlier.

Thanks in advance.

Justin

On Tue, Jul 18, 2017 at 07:26:30PM -0400, Tom Lane wrote:
> Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> > I've seen this before while doing SET STATISTICS on a larger number of columns
> > using xargs, but just came up while doing ADD of a large number of columns.
> > Seems to be roughly linear in number of children but superlinear WRT columns.
> > I think having to do with catalog update / cache invalidation with many
> > ALTERs*children*columns?
>
> I poked into this a bit. The operation is necessarily roughly O(N^2) in
> the number of columns, because we rebuild the affected table's relcache
> entry after each elementary ADD COLUMN operation, and one of the principal
> components of that cost is reading all the pg_attribute entries. However,
> that should only be a time cost not a space cost. Eventually I traced the
> O(N^2) space consumption to RememberToFreeTupleDescAtEOX, which seems to
> have been introduced in Simon's commit e5550d5fe, and which strikes me as
> a kluge of the first magnitude. Unless I am missing something, that
> function's design concept can fairly be characterized as "let's leak
> memory like there's no tomorrow, on the off chance that somebody somewhere
> is ignoring basic coding rules".
>
> I tried ripping that out, and the peak space consumption of your example
> (with 20 child tables and 1600 columns) decreased from ~3GB to ~200MB.
> Moreover, the system still passes make check-world, so it's not clear
> to me what excuse this code has to live.
>
> It's probably a bit late in the v10 cycle to be taking any risks in
> this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
> as soon as the v11 cycle opens, unless someone can show an example
> of non-broken coding that requires it. (And if so, there ought to
> be a regression test incorporating that.)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2017-10-18 20:00:30 Re: Finally upgrading to 9.6!
Previous Message David G. Johnston 2017-10-18 18:46:30 Re: Finally upgrading to 9.6!

Browse pgsql-hackers by date

  From Date Subject
Next Message Nico Williams 2017-10-18 20:01:12 Interest in a SECURITY DEFINER function current_user stack access mechanism?
Previous Message Robert Haas 2017-10-18 19:46:39 Re: [POC] Faster processing at Gather node