Re: ALTER TABLE ADD COLUMN fast default

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE ADD COLUMN fast default
Date: 2018-03-28 23:46:23
Message-ID: CAA8=A78gKti6QuJi0rHD=qR4JmFD59ktwW0N+OKy2DLgCOfEng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote:
>> Thanks for this, all looks good. Here is the consolidate patch
>> rebased. If there are no further comments I propose to commit this in
>> a few days time.
>
> Here's a bit of post commit review:
>
> @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum)
>
> /*
> * If tuple doesn't have all the atts indicated by tupleDesc, read the
> - * rest as null
> + * rest as NULLs or missing values
> */
> - for (; attno < attnum; attno++)
> - {
> - slot->tts_values[attno] = (Datum) 0;
> - slot->tts_isnull[attno] = true;
> - }
> + if (attno < attnum)
> + slot_getmissingattrs(slot, attno, attnum);
> +
> slot->tts_nvalid = attnum;
> }
>
> It's worthwhile to note that this'll re-process *all* missing values,
> even if they've already been deformed. I.e. if
> slot_getmissingattrs(.., first-missing)
> slot_getmissingattrs(.., first-missing + 1)
> slot_getmissingattrs(.., first-missing + 2)
> is called, all three missing values will be copied every time. That's
> because tts_nvalid isn't taken into account. I wonder if slot_getmissingattrs
> could take tts_nvalid into account?
>
> I also wonder if this doesn't deserve an unlikely(), to avoid the cost
> of spilling registers in the hot branch of not missing any values.
>
>

One of us at least is very confused about this function.
slot_getmissingattrs() gets called at most once by slot_getsomeattrs
and never for any attribute that's already been deformed.

> + else
> + {
> + /* if there is a missing values array we must process them one by one */
> + for (missattnum = lastAttNum - 1;
> + missattnum >= startAttNum;
> + missattnum--)
> + {
> + slot->tts_values[missattnum] = attrmiss[missattnum].ammissing;
> + slot->tts_isnull[missattnum] =
> + !attrmiss[missattnum].ammissingPresent;
> + }
> + }
> +}
>
> Why is this done backwards? It's noticeably slower to walk arrays
> backwards on some CPU microarchitectures.
>
>

I'll change it.

>
> @@ -176,6 +179,23 @@ CreateTupleDescCopyConstr(TupleDesc tupdesc)
> }
> }
>
>
>
> @@ -469,6 +503,29 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
> if (strcmp(defval1->adbin, defval2->adbin) != 0)
> return false;
> }
> + if (constr1->missing)
> + {
> + if (!constr2->missing)
> + return false;
> + for (i = 0; i < tupdesc1->natts; i++)
> + {
> + AttrMissing *missval1 = constr1->missing + i;
> + AttrMissing *missval2 = constr2->missing + i;
>
> It's a bit odd to not use array indexing here?
>

*shrug* Maybe. I'll change it if you like, doesn't seem that important or odd.

>
> @@ -625,7 +625,15 @@ ExecCopySlotMinimalTuple(TupleTableSlot *slot)
> if (slot->tts_mintuple)
> return heap_copy_minimal_tuple(slot->tts_mintuple);
> if (slot->tts_tuple)
> - return minimal_tuple_from_heap_tuple(slot->tts_tuple);
> + {
> + if (TTS_HAS_PHYSICAL_TUPLE(slot) &&
> + HeapTupleHeaderGetNatts(slot->tts_tuple->t_data)
> + < slot->tts_tupleDescriptor->natts)
> + return minimal_expand_tuple(slot->tts_tuple,
> + slot->tts_tupleDescriptor);
> + else
> + return minimal_tuple_from_heap_tuple(slot->tts_tuple);
> + }
>
>
> What's the TTS_HAS_PHYSICAL_TUPLE doing here? How can this be relevant
> given the previous tts_mintuple check? Am I missing something?
>
>

Hmm, that dates back to the original patch. My bad, I should have
picked that up. I'll just remove it.

>
> @@ -563,10 +569,63 @@ RelationBuildTupleDesc(Relation relation)
> MemoryContextAllocZero(CacheMemoryContext,
> relation->rd_rel->relnatts *
> sizeof(AttrDefault));
> - attrdef[ndef].adnum = attp->attnum;
> + attrdef[ndef].adnum = attnum;
> attrdef[ndef].adbin = NULL;
> +
> ndef++;
> }
> +
> + /* Likewise for a missing value */
> + if (attp->atthasmissing)
> + {
> + Datum missingval;
> + bool missingNull;
> +
> + /* Do we have a missing value? */
> + missingval = heap_getattr(pg_attribute_tuple,
> + Anum_pg_attribute_attmissingval,
> + pg_attribute_desc->rd_att,
> + &missingNull);
> + if (!missingNull)
> + {
> + /* Yes, fetch from the array */
> + MemoryContext oldcxt;
> + bool is_null;
> + int one = 1;
> + Datum missval;
> +
> + if (attrmiss == NULL)
> + attrmiss = (AttrMissing *)
> + MemoryContextAllocZero(CacheMemoryContext,
> + relation->rd_rel->relnatts *
> + sizeof(AttrMissing));
> +
> + missval = array_get_element(missingval,
> + 1,
> + &one,
> + -1,
> + attp->attlen,
> + attp->attbyval,
> + attp->attalign,
> + &is_null);
> + Assert(!is_null);
> + if (attp->attbyval)
> + {
> + /* for copy by val just copy the datum direct */
> + attrmiss[attnum - 1].ammissing = missval;
> + }
> + else
> + {
> + /* otherwise copy in the correct context */
> + oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
> + attrmiss[attnum - 1].ammissing = datumCopy(missval,
> + attp->attbyval,
> + attp->attlen);
> + MemoryContextSwitchTo(oldcxt);
> + }
> + attrmiss[attnum - 1].ammissingPresent = true;
> + }
> + }
> need--;
> if (need == 0)
> break;
>
> I'm still strongly object to do doing this unconditionally for queries
> that never need any of this. We're can end up with a significant number
> of large tuples in memory here, and then copy that through dozens of
> tupledesc copies in queries.
>
>

We're just doing the same thing we do for default values.

None of the tests I did with large numbers of missing values seemed to
show performance impacts of the kind you describe. Now, none of the
queries were particularly complex, but the worst case was from
actually using very large numbers of attributes with missing values,
where we'd need this data anyway. If we just selected a few attributes
performance went up rather than down.

> @@ -167,6 +170,12 @@ CATALOG(pg_attribute,1249) BKI_BOOTSTRAP BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BK
>
> /* Column-level FDW options */
> text attfdwoptions[1] BKI_DEFAULT(_null_);
> +
> + /*
> + * Missing value for added columns. This is a one element array which lets
> + * us store a value of the attribute type here.
> + */
> + anyarray attmissingval BKI_DEFAULT(_null_);
> #endif
> } FormData_pg_attribute;
>
>
> Still think this is a bad location, and it'll reduce cache hit ratio for
> catalog lookups.
>

As I think I mentioned before, this was discussed previously and as I
understood it this was the consensus location for it.

pg_attrdef isn't really a good place for it (what if they drop the
default?). So the only alternative then would be a completely new
catalog. I'd need a deal of convincing that that was justified.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-28 23:49:59 Re: ALTER TABLE ADD COLUMN fast default
Previous Message David G. Johnston 2018-03-28 23:38:51 Re: csv format for psql