Re: ALTER TABLE ADD COLUMN fast default

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE ADD COLUMN fast default
Date: 2018-03-29 23:38:54
Message-ID: CAA8=A7_MS4biaP2nuAcOMAcZ++_2e9uks8kYkHYPKB0q3SB44Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 30, 2018 at 5:00 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2018-03-29 10:16:23 +1030, Andrew Dunstan wrote:
>> On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > Hi,
>> >
>> > On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote:
>> >> Thanks for this, all looks good. Here is the consolidate patch
>> >> rebased. If there are no further comments I propose to commit this in
>> >> a few days time.
>> >
>> > Here's a bit of post commit review:
>> >
>> > @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum)
>> >
>> > /*
>> > * If tuple doesn't have all the atts indicated by tupleDesc, read the
>> > - * rest as null
>> > + * rest as NULLs or missing values
>> > */
>> > - for (; attno < attnum; attno++)
>> > - {
>> > - slot->tts_values[attno] = (Datum) 0;
>> > - slot->tts_isnull[attno] = true;
>> > - }
>> > + if (attno < attnum)
>> > + slot_getmissingattrs(slot, attno, attnum);
>> > +
>> > slot->tts_nvalid = attnum;
>> > }
>> >
>> > It's worthwhile to note that this'll re-process *all* missing values,
>> > even if they've already been deformed. I.e. if
>> > slot_getmissingattrs(.., first-missing)
>> > slot_getmissingattrs(.., first-missing + 1)
>> > slot_getmissingattrs(.., first-missing + 2)
>> > is called, all three missing values will be copied every time. That's
>> > because tts_nvalid isn't taken into account. I wonder if slot_getmissingattrs
>> > could take tts_nvalid into account?
>> >
>> > I also wonder if this doesn't deserve an unlikely(), to avoid the cost
>> > of spilling registers in the hot branch of not missing any values.
>
>> One of us at least is very confused about this function.
>> slot_getmissingattrs() gets called at most once by slot_getsomeattrs
>> and never for any attribute that's already been deformed.
>
> Imagine the same slot being used in two different expressions. The
> typical case for that is e.g. something like
> SELECT a,b,c,d FROM foo WHERE b = 1;
>
> this'll trigger one slot_getsomeattrs(slot, 2) call from within qual
> evaluation, and then a slot_getsomeattrs(slot, 4) from within the
> projection code. If you then imagine a tuple where only the first
> column exists physically, we'd copy b twice, because attno is only going
> to be one 1. I think we might just want to check tts nvalid in
> getmissingattrs?
>
>

OK. so add something like this to the top of slot_getmissingattrs()?

startAttNum = Max(startAttNum, slot->tts_nvalid);

>> > I'm still strongly object to do doing this unconditionally for queries
>> > that never need any of this. We're can end up with a significant number
>> > of large tuples in memory here, and then copy that through dozens of
>> > tupledesc copies in queries.
>
>> We're just doing the same thing we do for default values.
>
> That's really not a defense. It's hurting us there too.
>

I will take a look and see if it can be avoided easily.

>
>> None of the tests I did with large numbers of missing values seemed to
>> show performance impacts of the kind you describe. Now, none of the
>> queries were particularly complex, but the worst case was from
>> actually using very large numbers of attributes with missing values,
>> where we'd need this data anyway. If we just selected a few attributes
>> performance went up rather than down.
>
> Now imagine a partitioned workload with a few thousand partitions over a
> few tables. The additional memory is going to start being noticeable.
>
>
>> pg_attrdef isn't really a good place for it (what if they drop the
>> default?). So the only alternative then would be a completely new
>> catalog. I'd need a deal of convincing that that was justified.
>
> There's plenty databases with pg_attribute being many gigabytes large,
> and this is going to make that even worse.
>

I'll change it if there's a consensus about it, but so far the only
other opinion has been from Tom who doesn't apparently see much of a
problem.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-29 23:40:29 Re: ALTER TABLE ADD COLUMN fast default
Previous Message Peter Geoghegan 2018-03-29 23:33:37 Re: WIP: Covering + unique indexes.