Re: ALTER TABLE ADD COLUMN fast default

From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE ADD COLUMN fast default
Date: 2018-03-29 18:30:50
Message-ID: 20180329183050.u5trol4gtgtqe77h@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-03-29 10:16:23 +1030, Andrew Dunstan wrote:
> On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Hi,
> >
> > On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote:
> >> Thanks for this, all looks good. Here is the consolidate patch
> >> rebased. If there are no further comments I propose to commit this in
> >> a few days time.
> >
> > Here's a bit of post commit review:
> >
> > @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum)
> >
> > /*
> > * If tuple doesn't have all the atts indicated by tupleDesc, read the
> > - * rest as null
> > + * rest as NULLs or missing values
> > */
> > - for (; attno < attnum; attno++)
> > - {
> > - slot->tts_values[attno] = (Datum) 0;
> > - slot->tts_isnull[attno] = true;
> > - }
> > + if (attno < attnum)
> > + slot_getmissingattrs(slot, attno, attnum);
> > +
> > slot->tts_nvalid = attnum;
> > }
> >
> > It's worthwhile to note that this'll re-process *all* missing values,
> > even if they've already been deformed. I.e. if
> > slot_getmissingattrs(.., first-missing)
> > slot_getmissingattrs(.., first-missing + 1)
> > slot_getmissingattrs(.., first-missing + 2)
> > is called, all three missing values will be copied every time. That's
> > because tts_nvalid isn't taken into account. I wonder if slot_getmissingattrs
> > could take tts_nvalid into account?
> >
> > I also wonder if this doesn't deserve an unlikely(), to avoid the cost
> > of spilling registers in the hot branch of not missing any values.

> One of us at least is very confused about this function.
> slot_getmissingattrs() gets called at most once by slot_getsomeattrs
> and never for any attribute that's already been deformed.

Imagine the same slot being used in two different expressions. The
typical case for that is e.g. something like
SELECT a,b,c,d FROM foo WHERE b = 1;

this'll trigger one slot_getsomeattrs(slot, 2) call from within qual
evaluation, and then a slot_getsomeattrs(slot, 4) from within the
projection code. If you then imagine a tuple where only the first
column exists physically, we'd copy b twice, because attno is only going
to be one 1. I think we might just want to check tts nvalid in
getmissingattrs?

> > I'm still strongly object to do doing this unconditionally for queries
> > that never need any of this. We're can end up with a significant number
> > of large tuples in memory here, and then copy that through dozens of
> > tupledesc copies in queries.

> We're just doing the same thing we do for default values.

That's really not a defense. It's hurting us there too.

> None of the tests I did with large numbers of missing values seemed to
> show performance impacts of the kind you describe. Now, none of the
> queries were particularly complex, but the worst case was from
> actually using very large numbers of attributes with missing values,
> where we'd need this data anyway. If we just selected a few attributes
> performance went up rather than down.

Now imagine a partitioned workload with a few thousand partitions over a
few tables. The additional memory is going to start being noticeable.

> pg_attrdef isn't really a good place for it (what if they drop the
> default?). So the only alternative then would be a completely new
> catalog. I'd need a deal of convincing that that was justified.

There's plenty databases with pg_attribute being many gigabytes large,
and this is going to make that even worse.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-29 18:33:58 Re: pgsql: Add documentation for the JIT feature.
Previous Message Alvaro Herrera 2018-03-29 18:20:58 Re: pgsql: Add documentation for the JIT feature.