Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1

From: "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To: "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>
Cc: "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date: 2008-12-16 06:35:11
Message-ID: a301bfd90812152235j7af3bfabh5265c1e3ec813fd9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Hi,

> I will be working on a roadmap for the partitioning features. I think that
> there are different needs and that we will not be able to address them all
> in 8.5 or even 8.6.
> The goal will be to get things done step by step but possibly with a design
> that will not require major refactoring to support new features. I will try
> to setup the new wiki page tomorrow.
>

+1.

>
> In the meantime, I have made some more tests with the trigger in C (see
> attached patch). To prevent duplicating too much code, it requires the
> ExecRelCheck method to be exported (that would be nice to have this function
> exported in 8.4 so that we can start experimenting in 8.4 and don't have to
> wait another year for 8.5). If there is locality in the inserts (which might
> be the case if you COPY sorted data), the performance remains constant
> regardless the number of child tables.

A similar DELETE trigger should be pretty easy to write up in C. I think the
main challenge is with UPDATE triggers especially if the new row will fall
into another child table - but we can always throw an error for such a case
initially.

>
> Right now if the row cannot be inserted in any child table, it is inserted
> in the parent. If you want to fail, we can add a 'fail trigger' (after all
> other triggers) that generates an error if previous triggers did not capture
> the row. If you want to create a new partition, you can have another trigger
> to handle that.
>

One of the work items related to partitioning eventually is to avoid having
to APPEND the parent in all queries involving children. Maybe having an
overflow child table might help to catch failed triggers for those cases?

Regards,
Nikhils

>
> So I think that this trigger approach is pretty flexible like people used
> AOP in J2EE servers to process requests. It has also the advantage of
> allowing fast prototyping. It should also be easy to push that functionality
> down in the core as needed.
>
> Is it ok if I move Simon's requirement document under a more generic 'Table
> partitioning' page on the Wiki?
>
> Thanks for your feedback,
> manu
>
> Jaime Casanova wrote:
>
>> On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark <stark(at)enterprisedb(dot)com>
>> wrote:
>>
>>
>>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>>
>>>
>>>
>>>> Hackers,
>>>>
>>>> We don't yet seem to have a clear specification for this feature, and
>>>> the Other
>>>> Open Source DB has shown us how problematic it is to get
>>>> auto-partitioning
>>>> wrong.
>>>>
>>>> Should we defer auto-partitioning to 8.5?
>>>>
>>>>
>>> If we're serious about having a "next generation" partitioning with a
>>> concept
>>> of partition keys then it seems to me to make more sense to do that first
>>> and
>>> then add on a feature like this.
>>>
>>>
>>>
>>
>> +1
>>
>>
>>
>>> This is still very useful. I haven't looked at the actual patch, does it
>>> require core changes or can it be stashed in a pgfoundry or contrib
>>> module?
>>>
>>>
>>>
>>
>> what i don't like about this one is that it creates partitions at
>> create table time and to manually add all new partitions (inherit
>> tables and modify the trigger)... and what i want to see is an
>> automatic creation when it's needed...
>>
>>
>>
>>
>
>
> --
> Emmanuel Cecchet
> FTO @ Frog Thinker Open Source Development & Consulting
> --
> Web: http://www.frogthinker.org
> email: manu(at)frogthinker(dot)org
> Skype: emmanuel_cecchet
>
>
> ### Eclipse Workspace Patch 1.0
> #P Postgres-HEAD
> Index: src/test/regress/regress.c
> ===================================================================
> RCS file: /root/cvsrepo/pgsql/src/test/regress/regress.c,v
> retrieving revision 1.71
> diff -u -r1.71 regress.c
> --- src/test/regress/regress.c 25 Mar 2008 22:42:46 -0000 1.71
> +++ src/test/regress/regress.c 16 Dec 2008 01:42:50 -0000
> @@ -10,6 +10,9 @@
> #include "utils/geo_decls.h" /* includes <math.h> */
> #include "executor/executor.h" /* For GetAttributeByName */
> #include "commands/sequence.h" /* for nextval() */
> +#include "catalog/namespace.h"
> +#include "executor/executor.h"
> +#include "executor/tuptable.h"
>
> #define P_MAXDIG 12
> #define LDELIM '('
> @@ -732,3 +735,141 @@
> *--walk = '\0';
> PG_RETURN_CSTRING(result);
> }
> +
> +
> +/*
> + * Partition trigger test
> + *
> + * The trigger should be used this way:
> + * CREATE TRIGGER trigger_name
> + BEFORE INSERT ON master_table
> + FOR EACH ROW EXECUTE PROCEDURE
> partition_insert_trigger(child_table_name, column_number, min_val, max_val);
> + */
> +
> +extern Datum partition_insert_trigger(PG_FUNCTION_ARGS);
> +
> +static Datum
> +check_constraints_and_insert_tuple(Relation child_table_relation,
> TriggerData *trigdata ,HeapTuple trigtuple)
> +{ // Check the constraints
> + ResultRelInfo *resultRelInfo;
> + TupleTableSlot *slot;
> + EState *estate = CreateExecutorState();
> + Datum result;
> +
> + result = PointerGetDatum(trigdata->tg_trigtuple);
> +
> + resultRelInfo = makeNode(ResultRelInfo);
> + resultRelInfo->ri_RangeTableIndex = 1; /* dummy */
> + resultRelInfo->ri_RelationDesc = child_table_relation;
> +
> + estate->es_result_relations = resultRelInfo;
> + estate->es_num_result_relations = 1;
> + estate->es_result_relation_info = resultRelInfo;
> +
> + /* Set up a tuple slot too */
> + slot = MakeSingleTupleTableSlot(trigdata->tg_relation->rd_att);
> + ExecStoreTuple(trigtuple, slot, InvalidBuffer, false);
> +
> + if (ExecRelCheck(resultRelInfo, slot, estate) == NULL)
> + { // Constraints satisfied, insert the row in the child table
> + bool use_wal = true;
> + bool use_fsm=true;
> +
> + /* BEFORE ROW INSERT Triggers */
> + if (resultRelInfo->ri_TrigDesc &&
> +
> resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0)
> + {
> + HeapTuple newtuple;
> + newtuple = ExecBRInsertTriggers(estate,
> resultRelInfo, trigtuple);
> +
> + if (newtuple != trigtuple) /* modified by
> Trigger(s) */
> + {
> + heap_freetuple(trigtuple);
> + trigtuple = newtuple;
> + }
> + }
> +
> + /* Perform the insert
> + * TODO: Check that we detect constraint violation if
> before row insert does something bad */
> + heap_insert(child_table_relation, trigtuple,
> GetCurrentCommandId(true), use_wal, use_fsm);
> +
> + /* Update indices */
> + ExecOpenIndices(resultRelInfo);
> + if (resultRelInfo->ri_NumIndices > 0)
> + ExecInsertIndexTuples(slot, &(trigtuple->t_self),
> estate, false);
> +
> + /* AFTER ROW INSERT Triggers */
> + ExecARInsertTriggers(estate, resultRelInfo, trigtuple);
> +
> + result = PointerGetDatum(NULL);
> + }
> + // Free resources
> + FreeExecutorState(estate);
> + ExecDropSingleTupleTableSlot(slot);
> +
> + return result;
> +}
> +
> +
> +PG_FUNCTION_INFO_V1(partition_insert_trigger);
> +
> +static Relation last_inserted_relation;
> +
> +Datum
> +partition_insert_trigger(PG_FUNCTION_ARGS)
> +{
> + TriggerData *trigdata = (TriggerData *) fcinfo->context;
> + HeapTuple trigtuple= trigdata->tg_trigtuple;
> + TupleConstr *constr;
> + char *child_table_name;
> + Relation child_table_relation;
> + Oid relation_id;
> + Datum result;
> +
> + // Try to exploit locality for bulk inserts
> + // We expect consecutive insert to go to the same child table
> + if (last_inserted_relation != NULL)
> + { // Try the last table we used
> + result =
> check_constraints_and_insert_tuple(last_inserted_relation, trigdata,
> trigtuple);
> + if (result == PointerGetDatum(NULL))
> + return result;
> + // We got a miss
> + last_inserted_relation = NULL;
> + }
> +
> + /* make sure it's called as a trigger at all */
> + if (!CALLED_AS_TRIGGER(fcinfo))
> + elog(ERROR, "partition_insert_trigger: not called by trigger
> manager");
> +
> + /* Sanity checks */
> + if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event) ||
> !TRIGGER_FIRED_BEFORE(trigdata->tg_event))
> + elog(ERROR, "partition_insert_trigger: not called on insert
> before");
> +
> + // Child table name is either given as the unique parameter or it is
> the name of the trigger
> + if (trigdata->tg_trigger->tgnargs == 1)
> + child_table_name = trigdata->tg_trigger->tgargs[0];
> + else
> + child_table_name = trigdata->tg_trigger->tgname;
> +
> + // Lookup the child relation
> + relation_id = RelnameGetRelid(child_table_name);
> + if (relation_id == InvalidOid)
> + elog(ERROR, "partition_insert_trigger: Invalid child table %s",
> child_table_name);
> + child_table_relation = RelationIdGetRelation(relation_id);
> + if (child_table_relation == NULL)
> + elog(ERROR, "partition_insert_trigger: Failed to locate relation
> for child table %s", child_table_name);
> +
> + constr = child_table_relation->rd_att->constr;
> + if (constr->num_check == 0)
> + elog(ERROR, "partition_insert_trigger: No constraint found
> for child table %s", child_table_name);
> +
> + result = check_constraints_and_insert_tuple(child_table_relation,
> trigdata, trigtuple);
> +
> + if (result == PointerGetDatum(NULL))
> + last_inserted_relation = child_table_relation;
> +
> + RelationClose(child_table_relation);
> +
> + return result;
> +}
> +
> Index: src/backend/executor/execMain.c
> ===================================================================
> RCS file: /root/cvsrepo/pgsql/src/backend/executor/execMain.c,v
> retrieving revision 1.314
> diff -u -r1.314 execMain.c
> --- src/backend/executor/execMain.c 31 Oct 2008 21:07:54 -0000
> 1.314
> +++ src/backend/executor/execMain.c 16 Dec 2008 01:42:50 -0000
> @@ -1947,7 +1947,7 @@
> /*
> * ExecRelCheck --- check that tuple meets constraints for result relation
> */
> -static const char *
> +const char *
> ExecRelCheck(ResultRelInfo *resultRelInfo,
> TupleTableSlot *slot, EState *estate)
> {
> Index: src/test/regress/output/create_function_1.source
> ===================================================================
> RCS file:
> /root/cvsrepo/pgsql/src/test/regress/output/create_function_1.source,v
> retrieving revision 1.34
> diff -u -r1.34 create_function_1.source
> --- src/test/regress/output/create_function_1.source 31 Oct 2008
> 19:37:56 -0000 1.34
> +++ src/test/regress/output/create_function_1.source 16 Dec 2008
> 01:42:50 -0000
> @@ -47,6 +47,10 @@
> RETURNS int4
> AS '@libdir@/regress(at)DLSUFFIX@'
> LANGUAGE C STRICT;
> +CREATE FUNCTION partition_insert_trigger ()
> + RETURNS trigger
> + AS '@libdir@/regress(at)DLSUFFIX@'
> + LANGUAGE C STRICT;
> -- Things that shouldn't work:
> CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
> AS 'SELECT ''not an integer'';';
> @@ -80,3 +84,70 @@
> CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal
> AS 'nosuch';
> ERROR: there is no built-in function named "nosuch"
> +-- Partitioning trigger test
> +CREATE TABLE master (
> + id int not null,
> + date date not null,
> + value int
> +);
> +CREATE TABLE child_y2008m01 (
> + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' )
> +) INHERITS (master);
> +CREATE TABLE child_y2008m02 (
> + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' )
> +) INHERITS (master);
> +CREATE TABLE child_y2008m03 (
> + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' )
> +) INHERITS (master);
> +CREATE TRIGGER insert_child_y2008m01
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE
> partition_insert_trigger('child_y2008m01');
> +CREATE TRIGGER child_y2008m02
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
> +CREATE TRIGGER child_y2008m03
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
> +INSERT INTO master VALUES (1, '2008-01-15', 1);
> +INSERT INTO master VALUES (2, '2008-02-15', 2);
> +INSERT INTO master VALUES (3, '2008-03-15', 3);
> +INSERT INTO master VALUES (4, '2008-04-15', 4);
> +COPY master FROM '/root/pg_partitions/copy_input.txt';
> +select * from master;
> + id | date | value
> +----+------------+-------
> + 4 | 04-15-2008 | 4
> + 1 | 01-15-2008 | 1
> + 11 | 01-10-2008 | 11
> + 2 | 02-15-2008 | 2
> + 12 | 02-15-2008 | 12
> + 3 | 03-15-2008 | 3
> + 13 | 03-15-2008 | 13
> +(7 rows)
> +
> + select * from child_y2008m01;
> + id | date | value
> +----+------------+-------
> + 1 | 01-15-2008 | 1
> + 11 | 01-10-2008 | 11
> +(2 rows)
> +
> + select * from child_y2008m02;
> + id | date | value
> +----+------------+-------
> + 2 | 02-15-2008 | 2
> + 12 | 02-15-2008 | 12
> +(2 rows)
> +
> + select * from child_y2008m03;
> + id | date | value
> +----+------------+-------
> + 3 | 03-15-2008 | 3
> + 13 | 03-15-2008 | 13
> +(2 rows)
> +
> +DROP TABLE master CASCADE;
> +NOTICE: drop cascades to 3 other objects
> +DETAIL: drop cascades to table child_y2008m01
> +drop cascades to table child_y2008m02
> +drop cascades to table child_y2008m03
> Index: src/include/executor/executor.h
> ===================================================================
> RCS file: /root/cvsrepo/pgsql/src/include/executor/executor.h,v
> retrieving revision 1.152
> diff -u -r1.152 executor.h
> --- src/include/executor/executor.h 31 Oct 2008 21:07:55 -0000
> 1.152
> +++ src/include/executor/executor.h 16 Dec 2008 01:42:50 -0000
> @@ -155,6 +155,8 @@
> extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids);
> extern void ExecConstraints(ResultRelInfo *resultRelInfo,
> TupleTableSlot *slot, EState *estate);
> +extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo,
> + TupleTableSlot *slot, EState *estate);
> extern TupleTableSlot *EvalPlanQual(EState *estate, Index rti,
> ItemPointer tid, TransactionId priorXmax);
> extern PlanState *ExecGetActivePlanTree(QueryDesc *queryDesc);
> Index: src/test/regress/input/create_function_1.source
> ===================================================================
> RCS file:
> /root/cvsrepo/pgsql/src/test/regress/input/create_function_1.source,v
> retrieving revision 1.19
> diff -u -r1.19 create_function_1.source
> --- src/test/regress/input/create_function_1.source 1 Oct 2008 22:38:57
> -0000 1.19
> +++ src/test/regress/input/create_function_1.source 16 Dec 2008
> 01:42:50 -0000
> @@ -52,6 +52,12 @@
> AS '@libdir@/regress(at)DLSUFFIX@'
> LANGUAGE C STRICT;
>
> +CREATE FUNCTION partition_insert_trigger ()
> + RETURNS trigger
> + AS '@libdir@/regress(at)DLSUFFIX@'
> + LANGUAGE C STRICT;
> +
> +
> -- Things that shouldn't work:
>
> CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
> @@ -77,3 +83,48 @@
>
> CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal
> AS 'nosuch';
> +
> +-- Partitioning trigger test
> +
> +CREATE TABLE master (
> + id int not null,
> + date date not null,
> + value int
> +);
> +
> +CREATE TABLE child_y2008m01 (
> + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' )
> +) INHERITS (master);
> +
> +CREATE TABLE child_y2008m02 (
> + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' )
> +) INHERITS (master);
> +
> +CREATE TABLE child_y2008m03 (
> + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' )
> +) INHERITS (master);
> +
> +CREATE TRIGGER insert_child_y2008m01
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE
> partition_insert_trigger('child_y2008m01');
> +
> +CREATE TRIGGER child_y2008m02
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
> +
> +CREATE TRIGGER child_y2008m03
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
> +
> +INSERT INTO master VALUES (1, '2008-01-15', 1);
> +INSERT INTO master VALUES (2, '2008-02-15', 2);
> +INSERT INTO master VALUES (3, '2008-03-15', 3);
> +INSERT INTO master VALUES (4, '2008-04-15', 4);
> +COPY master FROM '/root/pg_partitions/copy_input.txt';
> +
> +select * from master;
> +select * from child_y2008m01;
> +select * from child_y2008m02;
> +select * from child_y2008m03;
> +
> +DROP TABLE master CASCADE;
>
>

--
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2008-12-16 06:56:07 Re: Function with defval returns error
Previous Message Rushabh Lathia 2008-12-16 06:32:24 Re: Function with defval returns error

Browse pgsql-patches by date

  From Date Subject
Next Message ITAGAKI Takahiro 2008-12-16 07:02:17 Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Previous Message Emmanuel Cecchet 2008-12-16 02:05:16 Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1