Re: Syntax for partitioning

From: pg(at)thetdh(dot)com
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Cc: "Hudson, T(dot) David" <pg1(at)thetdh(dot)com>
Subject: Re: Syntax for partitioning
Date: 2009-10-30 15:03:21
Message-ID: W1165229281297021256915001@webmail42
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> PARTITION BY RANGE ( a_expr )
>> ...
>> PARTITION BY HASH ( a_expr )
>> PARTITIONS num_partitions;

> Unless someone comes up with a maintenance plan for stable hashfunctions, we should probably not dare look into this yet.

What would cover the common use case of per-day quals and drops over an extended history period, say six or nine months? You don't get quite the same locality of reference, generally, with an unpartitioned table, due to slop in the arrival of rows. Ideally, you don't want to depend on an administrator, or even an administrative script, to continually intervene in the structure of a table, as would be the case with partitioning by range, and you don't want to coalesce multiple dates, as an arbitrary hash might do. What the administrator would want would be to decide what rows were too old to keep, then process (e.g. archive, summarize, filter) and delete them.

Suppose that the number of partitions were taken as a hint rather than as a naming modulus, and that any quasi-hash function had to be specified explicitly (although storage assignment could be based on a hash of the quasi-hash output). If a_expr were allowed to include a to-date conversion of a timestamp, day-by-day partitioning would fall out naturally. If, in addition, single-parameter (?) functions were characterized as range-preserving and order-preserving, plan generation could be improved for time ranges on quasi-hash-partitioned tables, without a formal indexing requirement.

There are cases where additional partition dimensions would be useful, for eventual parallelized operation on large databases, and randomizing quasi-hash functions would help. IMHO stability is not needed, except to the extent that hash functions have properties that lend themselves to plan generation and/or table maintenance.

It is not clear to me what purpose there would be in dropping a partition. This would be tantamount to deleting all of the rows in a partition, if it were analogous to dropping a table, and would require some sort of compensatory aggregation of existing partitions (in effect, a second partitioning dimension), if it were merely structural.

Perhaps I'm missing something here.

David Hudson

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-10-30 15:04:01 Re: Patch for automated partitioning
Previous Message Peter Eisentraut 2009-10-30 14:53:32 Weird PL/Python elog output