Re: Make deparsing of column defaults faster

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make deparsing of column defaults faster
Date: 2018-07-05 16:58:27
Message-ID: 20180705165827.GK7025@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 04, 2018 at 10:00:53PM -0400, Peter Eisentraut wrote:
> On 6/4/18 20:55, Jeff Janes wrote:
> > Since defaults can't contain Vars, this patch converts the second
> > parameter to zero in places where pg_get_expr is invoked on column
> > defaults.
>
> My in-progress generated columns patch removes that assumption (since a
> generated column generally refers to another column of the same table).

On Thu, Jul 05, 2018 at 04:45:07PM +0200, Peter Eisentraut wrote:
> On 29.06.18 05:15, Jeff Janes wrote:
> > Since pg_dump calls pg_get_expr once over and over again on the same
> > table consecutively, perhaps we could cache the column alias assignments
> > in a single-entry cache, so if it is called on the same table as last
> > time it just re-uses the aliases from last time.  I am not planning on
>
> I looked into that. deparse_context_for() is actually not that
> expensive on its own, well below one second, but it gets somewhat
> expensive when you call it 1600 times for one table. So to address that
> case, we can cache the deparse context between calls in the fn_extra
> field of pg_get_expr. The attached patch does that. This makes the
> pg_dump -s times pretty much constant even with 1600 columns with
> defaults.

I checked on one customer running PG10.4, for which pg_dump takes 8 minutes to
pg_dump -s.

I imported existing schema to PG12dev (which itself took 25min) and compared:
patched: 2m33.616s
unpatched: 7m19.578s

Note that I've reduced the number of child tables in this DB recently (by
repartitioning tables from daily to monthly granularity), thereby reducing the
number of columns of the largest tables by a factor of 30, and reducing the
size of pg_dump -s to 51MB from 120MB (6 months ago). I expect this patch
would've saved even more before the cleanup.

> How realistic is this use case? Is it worth it?

Note, that affects pg_upgrade, which is how this issue originally came up [0].
(But I believe pg_upgrade likes to call pg_dump from the old server version, so
pg_upgrade to v11 couldn't benefit unless this was included in PG10.5).

[pryzbyj(at)database postgresql]$ grep -c 'SET DEFAULT' /srv/cdrperfbackup/ts/2018-07-04/pg_dump-section=pre-data
183915

Justin

[0] https://www.postgresql.org/message-id/CAMkU%3D1x-e%2BmaqefhM1yMeSiJ8J9Z%2BSJHgW7c9bqo3E3JMG4iJA%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2018-07-05 17:22:46 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Andrew Gierth 2018-07-05 16:40:14 Re: Regarding shared_preload_libraries (postgresql.conf file)