Re: ALTER TABLE ADD COLUMN fast default

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE ADD COLUMN fast default
Date: 2018-02-20 06:33:07
Message-ID: CAA8=A78MxTsSUbcsuR=E_W8hvDV3qVSnM0gJJEMvPdUdC7+GLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

http://www.publix.com/myaccount/verify?validationCode=889fd4cb-6dbb-4f93-9d4f-c701410cd8a2

On Mon, Feb 19, 2018 at 1:18 PM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 19 February 2018 at 13:48, Andrew Dunstan
> <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>> On Sun, Feb 18, 2018 at 2:52 PM, David Rowley
>> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>>> I'll try to spend some time reviewing the code in detail soon.
>>>
>>
>> Thanks. I'll fix the typos in the next version of the patch, hopefully
>> after your review.
>
> Here's a more complete review... I reserve the right to be wrong about
> a few things and be nitpicky on a few more...

I've dealt with most of these issues. The only change of substance is
the suggested change in slot_getmissingattrs().

> Once that's fixed, you could ditch the Min() call in the following code:
>
> attno = Min(attno, attnum);
>
> slot_deform_tuple(slot, attno);
>
> /*
> * If tuple doesn't have all the atts indicated by tupleDesc, read the
> * rest as NULLs or missing values
> */
> if (attno < attnum)
> {
> slot_getmissingattrs(slot, attno);
> }
>
> And change it to something more like:
>
> /*
> * If tuple doesn't have all the atts indicated by tupleDesc, deform as
> * much as possible, then fill the remaining required atts with nulls or
> * missing values.
> */
> if (attno < attnum)
> {
> slot_deform_tuple(slot, attno);
> slot_getmissingattrs(slot, attno, attnum); // <-- (includes new parameter)
> }
> else
> slot_deform_tuple(slot, attnum);
>
> Which should result in a small performance increase due to not having
> to compare attno < attnum twice. Although, perhaps the average compile
> may optimise this anyway. I've not checked.
>

Maybe it would make a difference, but it seems unlikely to be significant.

> 11. Is there a reason why the following code in getmissingattr() can't
> be made into a bsearch?
>
[...]
> As far as I can see, the attrmiss is sorted by amnum. But maybe I just
> failed to find a case where it's not.
>
> I'd imagine doing this would further improve Tomas' benchmark score
> for the patched version, since he was performing a sum() on the final
> column.
>
> Although, If done, I'm actually holding some regard to the fact that
> users may one day complain that their query became slower after a
> table rewrite :-)
>
> Update: I've stumbled upon the following code:
>
> + /*
> + * We have a dependency on the attrdef array being filled in in
> + * ascending attnum order. This should be guaranteed by the index
> + * driving the scan. But we want to be double sure
> + */
> + if (!(attp->attnum > attnum))
> + elog(ERROR, "attribute numbers not ascending");
>
> and the code below this seems to also assemble the missing values in
> attnum order.
>
> While I'm here, I'd rather see this if test written as: if
> (attp->attnum <= attnum)
>
> Since the attnum order in the missing values appears to be well
> defined in ascending order, you can also simplify the comparison logic
> in equalTupleDescs(). The inner-most nested loop shouldn't be
> required.
>

Well, this comment in the existing code in equalTupleDescs() suggests
that in fact we can't rely on the attrdef items being in attnum order:

/*
* We can't assume that the items are always read from the system
* catalogs in the same order; so use the adnum field to identify
* the matching item to compare.
*/

And in any case the code out of date since the code no longer ties
missing values to default values. So I've removed the comment and the
error check.

There are probably a few optimisations we can make if we can assume
that the missing values are in the Tuple Descriptor are in attnum
order. But it's unclear to me why they should be and the attrdef
entries not.

>
> 15. Why not: slot->tts_values[missattnum] = (Datum) 0;
>
> + slot->tts_values[missattnum] = PointerGetDatum(NULL);
>
> There are a few other places you've done this. I'm unsure if there's
> any sort of standard to follow.
>

This is a pretty widely used idiom in the source.

>
> I'll await the next version before looking again.
>

Thanks

attached.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
fast_default-v11.patch application/octet-stream 103.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-02-20 06:42:58 Re: ALTER TABLE ADD COLUMN fast default
Previous Message Tsunakawa, Takayuki 2018-02-20 02:43:32 RE: [doc fix] Correct calculation of vm.nr_hugepages