Re: Make deparsing of column defaults faster

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: Make deparsing of column defaults faster
Date: 2018-06-29 03:15:54
Message-ID: CAMkU=1w+GM3D0H45mEp_A1xZs=tECD03syC8bSY0O6SrsNqeoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 4, 2018 at 10:00 PM, Peter Eisentraut <
peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:

> On 6/4/18 20:55, Jeff Janes wrote:
> > The user docs say about column defaults: "The value is any variable-free
> > expression (subqueries and cross-references to other columns in the
> > current table are not allowed)"
> >
> > And also say about pg_get_expr "If the expression might contain Vars,
> > specify the OID of the relation they refer to as the second parameter;
> > if no Vars are expected, zero is sufficient"
> >
> > Since defaults can't contain Vars, this patch converts the second
> > parameter to zero in places where pg_get_expr is invoked on column
> > defaults.
>
> My in-progress generated columns patch removes that assumption (since a
> generated column generally refers to another column of the same table).
>

Will your patch put the generated columns data into pg_attrdef, or create a
new catalog to hold them? I guess it would make sense to re-use the
catalog since you can't have both a default value and be a generated column.

> > Doing this drops the time needed to run `pg_dump -s` on a
> > 1600 column table with defaults on all columns by a factor of 40. So
> > pg_upgrade users with such tables should see a big win (cc Justin).
>
> That's impressive but also a bit disturbing. Where does this
> performance change come from? Can we harness it in different ways?
>

The most fundamental way would be to change make_colname_unique to use a
hash table, rather than an N^2 algorithm. It would probably be similar to
Tom's commit 8004953b5a2449c26c4e "Speed up ruleutils' name de-duplication
code" but he does warn in the commit message that his method would be
trickier to use for column aliases than table aliases. That would speed up
a lot more cases than just DEFAULTS (check constraints, maybe triggers) ,
but would be harder to do.

Even once we fix this, it still wouldn't make sense to create a deparse
context which cannot be necessary. Perhaps the call to deparse_context_for
could be made lazy, so that it is delayed until a Var is actually found in
the expression node tree, rather than being done preemptively. Then all
cases without Vars would be improved, not just cases where it could be
proved by static code analysis that Vars are not possible.

When setting up the deparse context for a single table, the column names
already need to be unique (I think). So another possibility is to pass a
flag down from set_simple_column_names to set_relation_column_names to
indicate no unique checking is needed.

Since pg_dump calls pg_get_expr once over and over again on the same table
consecutively, perhaps we could cache the column alias assignments in a
single-entry cache, so if it is called on the same table as last time it
just re-uses the aliases from last time. I am not planning on working on
that, I don't know where such a cache could be stored such that is freed
and invalidated at the appropriate times.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-06-29 03:53:10 Re: pgsql: Fix "base" snapshot handling in logical decoding
Previous Message Michael Paquier 2018-06-29 02:16:01 Re: SCRAM with channel binding downgrade attack