Automating partitions in PostgreSQL - - Partitions - Partitioning support in PostgreSQL (existing) & limitations - Automating partitioning in PostgreSQL - Features implemented - Release notes ------------------------------------------------------------------------------------------------------------- Partitions - Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits: • Query performance can be improved dramatically for certain kinds of queries. • Update performance can be improved too, since each piece of the table has indexes smaller than an index on the entire data set would be. When an index no longer fits easily in memory, both read and write operations on the index take progressively more disk accesses. • Bulk deletes may be accomplished by simply removing one of the partitions, if that requirement is planned into the partitioning design. DROP TABLE is far faster than a bulk DELETE, to say nothing of the ensuing VACUUM overhead. • Seldom-used data can be migrated to cheaper and slower storage media. The benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server. ------------------------------------------------------------------------------------------------------------- Partitioning Support in PostgreSQL (existing) & limitations - Currently, PostgreSQL supports partitioning via table inheritance. Each partition must be created as a child table of a single parent table. The parent table itself is normally empty; it exists just to represent the entire data set. The following forms of partitioning can be implemented in PostgreSQL: • Range Partitioning - The table is partitioned into "ranges" defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example : one might partition by date ranges, or by ranges of identifiers for particular business objects. • List Partitioning - The table is partitioned by explicitly listing which key values appear in each partition. • Hash partitioning is not currently supported. Limitations - • There is currently no way to verify that all of the CHECK constraints are mutually exclusive. Care is required by the database designer. • There is currently no simple way to specify that rows must not be inserted into the master table. A CHECK (false) constraint on the master table would be inherited by all child tables, so that cannot be used for this purpose. ------------------------------------------------------------------------------------------------------------- Automating partitioning in PostgreSQL Implementation makes of "Inheritance" feature to create partitions. As opposed to rules, the existing mechanism for routing the data, low-level 'C' triggers are used which are optimised to improve the speed of execution. A special system catalog ‘pg_partition’ is being maintained to store partition metadata. This catalog table will store the information such as partitioned table, partition, type of partition, partition key attributes, partition key order, values for ranges/list/hash. Column | Type | Modifiers -------------+----------+----------- partrelid | oid | not null parentrelid | oid | not null parttype | smallint | not null partkey | oid | not null keytype | oid | not null keyorder | smallint | not null minval | bytea | maxval | bytea | listval | bytea | hashval | smallint | Indexes: "pg_partition_oid_index" UNIQUE, btree (oid) "pg_partition_primary_index" btree (parentrelid, partrelid, keyorder) When a row is being inserted on partitioned table, system queries the partition catalog to find out relevant partition which satisfies the constraints specified. If a valid partition is not found, then the row is inserted into overflow table of the specified partitioned table. Updates are handled as a combination of delete and insert. For hash partititions, existing function hash_any() is being used to determine the hash values of incoming row, depending on the partition attributes in the order as specified. As of now, list partition is not being supported but current implementation can be extended to add support for list partitions as well. ------------------------------------------------------------------------------------------------------------- Features implemented - ------------------------------------------------------------------------------------------------------------- Creating partitions - Range partition - CREATE TABLE emp ( emp_id int not null primary key, designation text not null, location varchar(50) not null, jdate date not null, ctc float not null ) PARTITION BY RANGE (jdate, emp_id) ( emp80 (START '01-01-1980',MINVALUE END '01-01-1990',500), emp90 (START '01-01-1990',500 END '01-01-2000',1500+10), emp00 (START '1 Jan 2000',1500+20 END 'Jan 1, 2010', MAXVALUE), DEFAULT ); System would check for overlaps and would create the range partitioned table accordingly. User can opt for an overflow partition by specifying 'DEFAULT' while creating partitioned table. Hash partition - CREATE TABLE employee ( emp_id int not null primary key, designation text not null, location varchar(50) not null unique, jdate date not null, ctc float not null ) PARTITION BY HASH (location, jdate, emp_id) PARTITIONS 3; ------------------------------------------------------------------------------------------------------------- Insert/Delete/Update rows on partitions - For the DML queries on partitions, the system would query the partition catalog to figure out 'relevant' partition, depending on partition attributes specified. The constraints for target partitition would be evaluated and the row would be inserted/deleted/updated as appropriate. for example, INSERT INTO employee VALUES (1, 'Project Manager', 'London', '01-01-1984', 80000); INSERT INTO employee VALUES (737, 'Business Analyst', 'Honolulu', '12-11-1998', 70000); INSERT INTO employee VALUES (1567, 'Programmer', 'Guangzhou', '7 Jan, 2001', 30000); INSERT INTO employee VALUES (1515, 'System Administrator', 'Chennai', '7 July, 1994', 35000); UPDATE emp SET jdate = '1995-10-9', emp_id = 750 where emp_id = 737; UPDATE emp SET jdate = 'Feb 23, 2009' where jdate = '2001-01-07'; UPDATE emp SET jdate = '1994-01-01' where emp_id = 120; ------------------------------------------------------------------------------------------------------------- Dropping partitions - For Range partitions - Partitions can be dropped either by name or by range specified. ALTER TABLE emp DROP partition emp90; ALTER TABLE emp DROP partition (START minvalue,minvalue END '01-01-1990',500); For hash partitions - Number of hash partitions can be reduced by - ALTER TABLE emp update partitions to 1; ------------------------------------------------------------------------------------------------------------- Renaming partitions - Partition can be renamed by following command. ALTER TABLE emp RENAME partition emp00 to emp1; ------------------------------------------------------------------------------------------------------------- Adding partitions - Range partition - ALTER TABLE emp ADD range partition emp90 (START '01-01-1990',500 END '01-01-1995',1000); Hash partition - Number of hash partitions can be increased by - ALTER TABLE emp update partitions to 3; ------------------------------------------------------------------------------------------------------------- Updating partitions - Altering a Range partition to extend ranges - ALTER TABLE emp update partition emp90(START '01-01-1990',500 END '01-01-2000',1500+10); here the maxvalue for date is extended to 01-01-2000. Splitting a Range partition - User can split a partition by specifying valid split-point as shown below. ALTER TABLE emp split partition emp00 at values ('1 Jan, 2005', 2500) into emp1 and emp2; Merging Range partitions - User can merge range partitions, if they dont overlap with any other partitions. ALTER TABLE emp merge partitions emp1 and emp2 into emp00; Exchange - User can exchange a normal table with one of the partition with same schema. To successfully exchange a table with partition, the table should not have any constraints, triggers, indexes and the rows present in it should obey partition key constraints of the partition it is to be exchanged with. ALTER TABLE emp exchange emp00 with table a; Here, table 'a' has same schema and if it satisfied aforementioned conditions, would be exchanged with partition emp00; ------------------------------------------------------------------------------------------------------------- VACUUM/ANALYSE,TRUNCATE, COPY,DUMP and RESTORE - VACCUM, ANALYSE, TRUNCATE, COPY commands are extended for partitions. So when these commands are fired on partitioned table, system would apply those on individual partitions of that partitioned table as well. We have also extended backup and restore commands for partitions. For restore functionality, implementation assumes that partitioning feature is enabled and all the required functions are defined beforehand as specified in the testcases. ------------------------------------------------------------------------------------------------------------- Extending Indexes for partitions - Global indexes would be extended to operate on partitions as well. This feature would be available once merging of 'partitions' and 'global index' patches is done. ------------------------------------------------------------------------------------------------------------- Release notes Miscellaneous - 1. For partitioned table to exist, atleast one partition should be present. 2. Partition trigger is the last trigger to be executed. 3. Before restoring the database containing partitions, user needs to create partition functions as specified at the start in this test case document. As PostgreSQL restores the tables first and then functions, user might see few errors * cribbing that partition functions already exist, when trying to restore partitions. This is a known issue. Pending issues/tasks - 1. Extending PK and FK over partitions by merging of partitioning feature patch with global index feature.