Re: ALTER TABLE ADD COLUMN fast default

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE ADD COLUMN fast default
Date: 2018-02-19 02:48:27
Message-ID: CAKJS1f84LdLAcYQ66aoRjM-q_3Wxr=JfXV1BJWyRTf_Bg5=7WA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19 February 2018 at 13:48, Andrew Dunstan
<andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
> On Sun, Feb 18, 2018 at 2:52 PM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> I'll try to spend some time reviewing the code in detail soon.
>>
>
> Thanks. I'll fix the typos in the next version of the patch, hopefully
> after your review.

Here's a more complete review... I reserve the right to be wrong about
a few things and be nitpicky on a few more...

3. All doc changes are indented 1 space too much. Many lines have also
been broken needlessly before 80 chars

4. Surplus "it" at end of line.

+ * Return the missing value of an attribute, or NULL if it
+ * there is no missing value.

5. "sttributes" -> "attributes"

+ /* initialize all the missing sttributes to null */

6. Surplus space before >=

+ if (missattn >= startAttNum && !attrmiss[i].ammissingNull)

7. Why bother with the else here?

+ if (slot->tts_tupleDescriptor->constr)
+ {
+ missingnum = slot->tts_tupleDescriptor->constr->num_missing -1;
+ attrmiss = slot->tts_tupleDescriptor->constr->missing;
+ }
+ else
+ {
+ missingnum = -1;
+ attrmiss = NULL;
+ }

You may as well just do:

+ if (!slot->tts_tupleDescriptor->constr)
+ return;

+ missingnum = slot->tts_tupleDescriptor->constr->num_missing -1;
+ attrmiss = slot->tts_tupleDescriptor->constr->missing;

8. -1; should be - 1;

+ missingnum = slot->tts_tupleDescriptor->constr->num_missing -1;

9. Surplus braces can be removed:

+ if (attno < attnum)
{
- slot->tts_values[attno] = (Datum) 0;
- slot->tts_isnull[attno] = true;
+ slot_getmissingattrs(slot, attno);
}

10. You've made a behavioral change in the following code:

- for (; attno < attnum; attno++)
+ if (attno < attnum)
{
- slot->tts_values[attno] = (Datum) 0;
- slot->tts_isnull[attno] = true;
+ slot_getmissingattrs(slot, attno);
}

Previously this just used to initialise up to attnum, but
slot_getmissingattrs does not know about attnum and always seems to
fill the entire slot with all atts in the TupleDesc
This may cause some performance regression when in cases where only
part of the tuple needs to be deformed. I think Tomas' benchmarks used
the final column in the table, so likely wouldn't have notice this,
although it may have if he'd aggregated one of the middle columns.

Once that's fixed, you could ditch the Min() call in the following code:

attno = Min(attno, attnum);

slot_deform_tuple(slot, attno);

/*
* If tuple doesn't have all the atts indicated by tupleDesc, read the
* rest as NULLs or missing values
*/
if (attno < attnum)
{
slot_getmissingattrs(slot, attno);
}

And change it to something more like:

/*
* If tuple doesn't have all the atts indicated by tupleDesc, deform as
* much as possible, then fill the remaining required atts with nulls or
* missing values.
*/
if (attno < attnum)
{
slot_deform_tuple(slot, attno);
slot_getmissingattrs(slot, attno, attnum); // <-- (includes new parameter)
}
else
slot_deform_tuple(slot, attnum);

Which should result in a small performance increase due to not having
to compare attno < attnum twice. Although, perhaps the average compile
may optimise this anyway. I've not checked.

11. Is there a reason why the following code in getmissingattr() can't
be made into a bsearch?

+ for (missingnum = tupleDesc->constr->num_missing - 1;
+ missingnum >= 0; missingnum--)
+ {
+ if (attrmiss[missingnum].amnum == attnum)
+ {
+ if (attrmiss[missingnum].ammissingNull)
+ {
+ *isnull = true;
+ return (Datum) 0;
+ }
+ else
+ {
+ *isnull = false;
+ return attrmiss[missingnum].ammissing;
+ }
+ }
+ }
+ Assert(false); /* should not be able to get here */
+ }

As far as I can see, the attrmiss is sorted by amnum. But maybe I just
failed to find a case where it's not.

I'd imagine doing this would further improve Tomas' benchmark score
for the patched version, since he was performing a sum() on the final
column.

Although, If done, I'm actually holding some regard to the fact that
users may one day complain that their query became slower after a
table rewrite :-)

Update: I've stumbled upon the following code:

+ /*
+ * We have a dependency on the attrdef array being filled in in
+ * ascending attnum order. This should be guaranteed by the index
+ * driving the scan. But we want to be double sure
+ */
+ if (!(attp->attnum > attnum))
+ elog(ERROR, "attribute numbers not ascending");

and the code below this seems to also assemble the missing values in
attnum order.

While I'm here, I'd rather see this if test written as: if
(attp->attnum <= attnum)

Since the attnum order in the missing values appears to be well
defined in ascending order, you can also simplify the comparison logic
in equalTupleDescs(). The inner-most nested loop shouldn't be
required.

12. Additional braces not required in:

+ if (tupleDesc &&
+ attnum <= tupleDesc->natts &&
+ TupleDescAttr(tupleDesc, attnum - 1)->atthasmissing)
+ {
+ return false;
+ }
+ else
+ {
+ return true;
+ }

13. Also, just on study of the following condition:

+ if (tupleDesc &&
+ attnum <= tupleDesc->natts &&
+ TupleDescAttr(tupleDesc, attnum - 1)->atthasmissing)

I think that instead of testing "attnum <= tupleDesc->natts", you
should remove that condition an just add;

Assert(!tupleDesc || attnum <= tupleDesc->natts);

to the start of the function. No code should ever pass an attnum
greater than the tupleDesc's natts, so there's probably not much point
in checking that now if we didn't before.

14. In expand_tuple, the following can be declared in inner scope:

+ HeapTupleHeader targetTHeader;

(It would be nice if expand_tuple was a bit easier on the eye, but I
have no suggestions to improve it right now)

15. Why not: slot->tts_values[missattnum] = (Datum) 0;

+ slot->tts_values[missattnum] = PointerGetDatum(NULL);

There are a few other places you've done this. I'm unsure if there's
any sort of standard to follow.

16. Additional braces not required in:

{
- *isnull = true;
- return (Datum) 0;
+ return getmissingattr(slot->tts_tupleDescriptor, attnum, isnull);
}

Likewise in:

- for (; attnum < tdesc_natts; attnum++)
+ if (attnum < tdesc_natts)
{
- slot->tts_values[attnum] = (Datum) 0;
- slot->tts_isnull[attnum] = true;
+ slot_getmissingattrs(slot, attnum);
}
+

and:

- for (; attno < attnum; attno++)
+ if (attno < attnum)
{
- slot->tts_values[attno] = (Datum) 0;
- slot->tts_isnull[attno] = true;
+ slot_getmissingattrs(slot, attno);
}

maybe just fix this in all places your patch is touching.

17. Unrelated change:

diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index ed90a02303..3b0be39101 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -416,7 +416,6 @@ sub morph_row_for_pgattr
}
elsif ($priornotnull)
{
-
# attnotnull will automatically be set if the type is
# fixed-width and prior columns are all NOT NULL ---
# compare DefineAttr in bootstrap.c. oidvector and

18. Can you explain what you mean by "doomed" here?

+ * If we add a column that is not null and there is no missing value
+ * (i.e. the missing value is NULL) then this ADD COLUMN is doomed.
+ * Unless the table is empty...

19. A few lines below are belong 80 chars:

+ if (HeapTupleHeaderGetNatts(tuple.t_data) <
RelationGetDescr(erm->relation)->natts)
+ {
+ copyTuple = heap_expand_tuple(&tuple,
+ RelationGetDescr(erm->relation));

Likewise in:

+ if (HeapTupleHeaderGetNatts(slot->tts_tuple->t_data) <
slot->tts_tupleDescriptor->natts)

and:

+ if (att_tup->atthasdef && rd_att->constr && rd_att->constr->num_defval > 0)

20. "missin" -> "missing", "Fill" -> "fill"

+ * If the column has a default or missin value Fill it into the
+ * attrdef array

21. Spaces instead of tabs. Maybe just pgindent the whole patch?

+ AttrMissing *missing; /* missing attributes values, NULL if none */
uint16 num_defval;
uint16 num_check;
+ uint16 num_missing;

22. Missing <tab><space> before " 24"

+#define Anum_pg_attribute_attmissingval 24

Also, missing <tab> before " 15"

+#define Anum_pg_attribute_atthasmissing 15

23. Looks like you'll need to find a new home for the test, since the
comment claims we want no more than 19. You've added the 20th.

# NB: temp.sql does a reconnect which transiently uses 2 connections,
# so keep this parallel group to at most 19 tests
# ----------
-test: plancache limit plpgsql copy2 temp domain rangefuncs prepare
without_oid conversion truncate alter_table sequence polymorphism
rowtypes returning largeobject with xml
+test: plancache limit plpgsql copy2 temp domain rangefuncs prepare
without_oid conversion truncate alter_table sequence polymorphism
rowtypes returning largeobject with xml fast_default

I'll await the next version before looking again.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2018-02-19 03:01:15 RE: [bug fix] Cascaded standby cannot start after a clean shutdown
Previous Message Tomas Vondra 2018-02-19 02:11:32 Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath