Re: Declarative partitioning - another take

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-09-06 07:13:39
Message-ID: CAFjFpRcR=8J0kSx6kHVWGtun1k+Jfb3JfV0R038ns0a_287HAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I found a server crash while running make check in regress folder. with
this set of patches. Problem is RelationBuildPartitionKey() partexprsrc may
be used uninitialized. Initializing it with NIL fixes the crash. Here's
patch to fix it. Came up with the fix after discussion with Amit.

On Fri, Aug 26, 2016 at 1:45 PM, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
> wrote:

>
> Sorry it took me a while to reply. Attached updated patches including the
> review comments on 0001 at [1].
>
> On 2016/08/17 3:54, Robert Haas wrote:
> > On Wed, Aug 10, 2016 at 7:09 AM, Amit Langote
> > <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >> 0002-psql-and-pg_dump-support-for-partitioned-tables.patch
> >
> > + if (pset.sversion >= 90600 && tableinfo.relkind == 'P')
> >
> > Version check is redundant, right?
>
> Yep, fixed.
>
> > +) PARTITION BY RANGE ((a+b));
> > +\d describe_range_key
> > +Partitioned table "public.describe_range_key"
> > + Column | Type | Modifiers
> > +--------+---------+-----------
> > + a | integer |
> > + b | integer |
> > +Partition Key: PARTITION BY RANGE (((a + b)))
> >
> > I understand that it's probably difficult not to end up with two sets
> > of parentheses here, but can we avoid ending up with three sets?
>
> Fixed. This code was copy-pasted from the index code which has other
> considerations for adding surrounding parentheses which don't apply to the
> partition key code.
>
> > Also, I wonder if pg_get_partkeydef() should omit "PARTITION BY" and
> > pg_dump can add that part back. Then this could say:
> >
> > Partition Key: RANGE ((a + b))
> >
> > ...which seems a good deal more natural than what you have now.
>
> Agreed, so done that way.
>
> >> 0003-Catalog-and-DDL-for-partition-bounds.patch
> >>
> >> Partition DDL includes both a way to create new partition and "attach"
> an
> >> existing table as a partition of parent partitioned table. Attempt to
> >> drop a partition using DROP TABLE causes an error. Instead a partition
> >> needs first to be "detached" from parent partitioned table. On the
> other
> >> hand, dropping the parent drops all the partitions if CASCADE is
> specified.
> >
> > Hmm, I don't think I like this. Why should it be necessary to detach
> > a partition before dropping it? That seems like an unnecessary step.
>
> I thought we had better lock the parent table when removing one of its
> partitions and it seemed a bit odd to lock the parent table when dropping
> a partition using DROP TABLE? OTOH, with ALTER TABLE parent DETACH
> PARTITION, the parent table is locked anyway.
>
> > [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY
> > IMMEDIATE ]
> > +
> > </synopsis>
> >
> > Unnecessary hunk.
> >
> > + <para>
> > + If this table is a partition, one cannot perform <literal>DROP
> > NOT NULL</>
> > + on a column if it is marked not null in the parent table.
> > + not null.
> > + </para>
> >
> > Sentence fragment.
>
> Fixed.
>
> >
> > + <para>
> > + Note that unlike the <literal>ATTACH PARTITION</> command, a
> partition
> > + being detached can be itself partitioned. In that case, it
> continues
> > + to exist as such.
> > + </para>
> >
> > This is another restriction I don't understand. Why can't I attach a
> > partitioned table?
>
> I removed this restriction.
>
> ATExecAttachPartition() adds a AT work queue entry for the table being
> attached to perform a heap scan in the rewrite phase. The scan is done
> for checking that no row violates the partition boundary condition. When
> attaching a partitioned table as partition, multiple AT work queue entries
> are now added - one for each leaf partition of the table being attached.
>
> > + indicate that descendant tables are included. Note that whether
> > + <literal>ONLY</> or <literal>*</> is specified has no effect in
> case
> > + of a partitioned table; descendant tables (in this case,
> partitions)
> > + are always included.
> >
> > Ugh, why? I think this should work exactly the same way for
> > partitioned tables that it does for any other inheritance hierarchy.
> > Sure, you'll get no rows, but who cares?
>
> Agreed, done that way.
>
> > +CREATE FOREIGN TABLE measurement_y2016m07
> > + PARTITION OF measurement FOR VALUES START ('2016-07-01') END
> > ('2016-08-01');
> > + SERVER server_07;
> >
> > Extra semicolon?
>
> Fixed.
>
> >
> > + A partition cannot have columns other than those inherited from
> the
> > + parent. That includes the <structfield>oid</> column, which can
> be
> >
> > I think experience suggests that this is a good restriction, but then
> > why does the syntax synopsis indicate that PARTITION BY can be
> > specified along with column definitions? Similarly for CREATE FOREIGN
> > TABLE.
>
> The syntax synopsis of CREATE TABLE ... PARTITION OF indicates that a list
> of column WITH OPTION and/or table_constraint can be specified. It does
> not allow column definitions.
>
> In this case, inherited columns will be listed in the PARTITION BY clause.
>
> Do you mean that the CREATE TABLE ... PARTITION OF syntax should allow
> column definitions just like INHERITS does and unlike regular inheritance,
> throw error if columns other than those to be merged are found?
>
> >
> > + When specifying for a table being created as partition, one needs
> to
> > + use column names from the parent table as part of the key.
> >
> > This is not very clear.
>
> Sentence removed because it may be clear from the context that inherited
> columns are to be used for the partition key.
>
> > - /* Remove NO INHERIT flag if rel is a partitioned table */
> > - if (relid_is_partitioned(relid))
> > + /* Discard NO INHERIT, if relation is a partitioned table or a
> > partition */
> > + if (relid_is_partitioned(relid) || relid_is_partition(relid))
> > is_no_inherit = false;
> >
> > It might be right to disallow NO INHERIT in this case, but I don't
> > think it can be right to just silently ignore it.
>
> OK, I changed this to instead throw an error if a NO INHERIT check
> constraint is added to a partitioned table or a partition.
>
> > + * Not flushed from the cache by RelationClearRelation() unless changed
> because
> > + * of addition or removal of partitions.
> >
> > This seems unlikely to be safe, unless I'm missing something.
>
> Like TupleDesc, a table's PartitionDesc is preserved across relcache
> rebuilds. PartitionDesc consists of arrays of OIDs and partition bounds
> for a table's immediate partitions. It can only change by adding/removing
> partitions to/from the table which requires an exclusive lock on it.
> Since this data can grow arbitrarily big it seemed better to not have to
> copy it around, so a direct pointer to the relcache field (rd_partdesc) is
> given to callers. relcache rebuilds that do not logically change
> PartitionDesc leave it intact so that some user of it is not left with a
> dangling pointer. Am I missing something?
>
> >
> > + form = (Form_pg_inherits) GETSTRUCT(tuple);
> > +
> > + systable_endscan(scan);
> > + heap_close(catalogRelation, AccessShareLock);
> > +
> > + return form->inhparent;
> >
> > This is unsafe. After systable_endscan, it is no longer OK to access
> > form->inhparent.
> >
> > Try building with CLOBBER_CACHE_ALWAYS to find other cache flush hazards.
> >
> > There should probably be a note in the function header comment that it
> > is unsafe to use this for an inheritance child that is not a
> > partition, because there could be more than one parent in that case.
> > Or maybe the whole idea of this function just isn't very sound...
>
> Fixed unsafe coding and added a comment to the function saying it should
> be called on tables known to be partitions.
>
> >
> > +static List *
> > +get_partitions(Oid relid, int lockmode)
> > +{
> > + return find_inheritance_children(relid, lockmode);
> > +}
> >
> > What's the point? If we're going to have a wrapper here at all, then
> > shouldn't it have a name that matches the existing convention - e.g.
> > find_partitions() or find_child_partitions()? But I think you might
> > as well just use find_inheritance_children() directly.
>
> OK, changed to just use find_inheritance_children() directly.
>
> >
> > + * Happens when we have created the pg_inherits entry
> > but not the
> > + * pg_partition entry yet.
> >
> > Why do we ever allow the flow of control to reach this point while we
> > are in such an intermediate state?
>
> Fixed to prevent this from happening.
>
> >
> > +free_partition_info(PartitionInfoData **p, int num)
> >
> > Seems very error-prone. Isn't this why MemoryContextReset was invented?
>
> Got rid of this function.
>
> >
> > +relid_is_partition(Oid relid)
> > +{
> > + return SearchSysCacheExists1(PARTRELID,
> ObjectIdGetDatum(relid));
> > +}
> >
> > This is used in a lot of places, and the overhead of checking it in
> > all of those places is not necessarily nil. Syscache lookups aren't
> > free. What if we didn't create a new catalog for this and instead
> > just added a relpartitionbound attribute to pg_class? It seems a bit
> > silly to have a whole extra catalog to store one extra column...
>
> OK, I got rid of the pg_partition catalog and added a pg_class attribute
> for storing partition bound. Because the newly added attribute is a
> pg_node_tree and hence not readily accessible without a heap_getattr()
> call, I added a boolean relispartition as well. Now all the
> relid_is_partition() calls have been replaced by checks using
> relation->rd_rel->relispartition.
>
> >
> > /*
> > + * If this foreign table is a partition, check that the FDW
> supports
> > + * insert.
> > + */
> > + if (stmt->base.partbound != NULL)
> > + {
> > + FdwRoutine *fdw_routine;
> > +
> > + fdw_routine = GetFdwRoutine(fdw->fdwhandler);
> > + if (fdw_routine->ExecForeignInsert == NULL)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FDW_NO_
> SCHEMAS),
> > + errmsg("cannot create foreign
> > table as partition"),
> > + errdetail("foreign-data
> > wrapper \"%s\" does not support insert",
> > + fdw->fdwname)));
> > + }
> >
> > Why? This seems like an entirely arbitrary prohibition. If inserts
> > aren't supported, then they'll fail at runtime. Same for updates or
> > deletes, for which you have not added checks. I think you should just
> > remove this.
>
> OK, removed this check.
>
> Instead, ExecInitModifyTable()/BeginCopy() call CheckValidResultRel() for
> every leaf partition to check that they are ready for CMD_INSERT.
>
> >
> > + /* Force inheritance recursion, if partitioned table. */
> > + if (recurse || relid_is_partitioned(myrelid))
> >
> > Disagree with this, too. There's no reason for partitioned tables to
> > be special in this way. Similarly, disagree with all of the places
> > that do something similar.
>
> Removed the forced recursion bit here and a few other places.
>
> >
> > - errmsg("column \"%s\" in child table
> > must be marked NOT NULL",
> > - attributeName)));
> > + errmsg("column \"%s\" in %s table must
> > be marked NOT NULL",
> > + attributeName,
> > + is_attach_partition ?
> > "source" : "child")));
> >
> > You have a few of these; they cause problems for translators, because
> > different languages have different word ordering. Repeat the entire
> > message instead: is_attach_partition ? "column \"%s\" in source table
> > must be marked NOT NULL" : "column \"%s\" in child table must be
> > marked NOT NULL".
>
> Changed so that the entire message is repeated.
>
> >
> > +-- XXX add ownership tests
> >
> > So do that. :-)
>
> Done, sorry about that.
>
> >
> > +ERROR: column "b" is not null in parent
> > +HINT: Please drop not null in the parent instead
> >
> > Hmm. That hint doesn't seem like project style, and I'm not sure
> > that it really makes sense to issue such a hint anyway. Who knows
> > whether that is the right thing to do? I think you should somehow be
> > complaining about the fact that this is a partition, rather than
> > complaining about the fact that the column is NOT NULL in the parent.
> > Are we insisting that the flags match exactly, or only that the child
> > may not allow nulls unless the parent does?
>
> I think the latter. It is assumed that all the parent's constraints are
> present in a child table, because in ExecInsert()/CopyFrom() we perform
> ExecConstraints() using the child relation even if the actual insert was
> on the parent. Also, if an inherited NOT NULL constraint on a child's
> column is dropped irrespective of parent's, selecting a parent's NOT NULL
> column might return nulls from the child table that no longer has the
> constraint.
>
> I recently came across a related proposal whereby dropping *inherited* NOT
> NULL from child tables will be prevented. Problems in letting it be be
> dropped are mentioned here:
>
> https://www.postgresql.org/message-id/21633.1448383428@sss.pgh.pa.us
>
> That proposal is probably being reworked such that NOT NULL constraints
> get a pg_constraint entry with proper accounting of inheritance count.
>
> > +ERROR: new partition's list of values overlaps with partition
> > "lpart1" of "list_parted"
> >
> > Maybe just:
> >
> > ERROR: partitions must not overlap
> > -or-
> > ERROR: partition "%s" would overlap partition "%s"
>
> OK, I changed to the second message.
>
> >
> > As before, this is just an initial read-through, so apologies for
> > whatever I may have missed.
>
> Thanks a lot for the review.
>
> By the way, I am still working on the following items and will be included
> in the next version of the patch.
>
> * Fix internal representation of list partition bounds to be more efficient
> * Add PartitionOptInfo
>
> As mentioned in [2], I have refactored the inheritance expansion code
> within optimizer so that a partitioned table's inheritance hierarchy is
> preserved in resulting AppendRelInfos (patch 0005). One immediate benefit
> of that is that if constraint exclusion determines that an intermediate
> partition is to be excluded then all partitions underneath it are excluded
> automatically. With the current flattened model, all partitions in that
> subtree would have been processed and excluded one-by-one.
>
> Regards,
> Amit
>
> [1]
> https://www.postgresql.org/message-id/CA%2BTgmoZ008qTgd_Qg6_
> oZb3i0mOYrS6MdhncwgcqPKahixjarg%40mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/f2a9592a-17e9-4c6a-
> e021-03b802195ce7%40lab.ntt.co.jp
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
fix_server_crash.patch text/x-patch 469 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2016-09-06 07:34:54 Re: Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Michael Paquier 2016-09-06 07:11:52 Re: pgsql: Add putenv support for msvcrt from Visual Studio 2013