Re: Should we warn against using too many partitions?

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Should we warn against using too many partitions?
Date: 2019-06-08 06:38:58
Message-ID: 20190608063858.GF3079@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I made another pass, hopefully it's useful and not too much of a pain.

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index cce1618fc1..be2ca3be48 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -4674,6 +4675,88 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
</itemizedlist>
</para>
</sect2>
+
+ <sect2 id="ddl-partitioning-declarative-best-practices">
+ <title>Declarative Partitioning Best Practices</title>
+
+ <para>
+ The choice of how to partition a table should be made carefully as the
+ performance of query planning and execution can be negatively affected by
+ poorly made design decisions.

Maybe just "poor design"

+ partitioned table. <literal>WHERE</literal> clause items that match and
+ are compatible with the partition key can be used to prune away unneeded

remove "away" ?

+ requirements for the <literal>PRIMARY KEY</literal> or a
+ <literal>UNIQUE</literal> constraint. Removal of unwanted data is also
+ a factor to consider when planning your partitioning strategy as an entire
+ partition can be removed fairly quickly. However, if data that you want

Can we just say "dropped" ? On my first (re)reading, I briefly thought this
was now referring to "pruning" as "removal".

+ to keep exists in that partition then that means having to resort to using
+ <command>DELETE</command> instead of removing the partition.
+ </para>
+
+ <para>
+ Choosing the target number of partitions by which the table should be
+ divided into is also a critical decision to make. Not having enough

Should be: ".. target number .. into which .. should be divided .."

+ partitions may mean that indexes remain too large and that data locality
+ remains poor which could result in poor cache hit ratios. However,

Change the 2nd remains to "is" and the second poor to "low" ?

+ dividing the table into too many partitions can also cause issues.
+ Too many partitions can mean slower query planning times and higher memory

s/slower/longer/

+ consumption during both query planning and execution. It's also important
+ to consider what changes may occur in the future when choosing how to
+ partition your table. For example, if you choose to have one partition

Remove "when choosing ..."? Or say:

|When choosing how to partition your table, it's also important to consider
|what changes may occur in the future.

+ per customer and you currently have a small number of large customers,
+ what will the implications be if in several years you obtain a large
+ number of small customers. In this case, it may be better to choose to
+ partition by <literal>HASH</literal> and choose a reasonable number of
+ partitions rather than trying to partition by <literal>LIST</literal> and
+ hoping that the number of customers does not increase significantly over
+ time.
+ </para>

It's an unusual thing for which to hope :)

+ <para>
+ Sub-partitioning can be useful to further divide partitions that are
+ expected to become larger than other partitions, although excessive
+ sub-partitioning can easily lead to large numbers of partitions and can
+ cause the problems mentioned in the preceding paragraph.
+ </para>

cause the SAME problems ?

+ It is also important to consider the overhead of partitioning during
+ query planning and execution. The query planner is generally able to
+ handle partition hierarchies up a few thousand partitions fairly well,
+ provided that typical queries prune all but a small number of partitions
+ during query planning. Planning times become slower and memory

s/slower/longer/

Hm, maybe say "typical queries ALLOW PRUNNING .."

+ consumption becomes higher when more partitions remain after the planner
+ performs partition pruning. This is particularly true for the

Just say: "remain after planning" ?

+ <command>UPDATE</command> and <command>DELETE</command> commands. Also,
+ even if most queries are able to prune a large number of partitions during
+ query planning, it still may be undesirable to have a large number of

may still ?

+ partitions as each partition requires metadata about the partition to be
+ stored in each session that touches it. If each session touches a large

stored for ?

+ number of partitions over a period of time then the memory consumption for
+ this may become significant.
+ </para>

Remove "over a period of time" ?
Add a comma?

Maybe say:

|If each session touches a large number of partitions, then the memory
|overhead may become significant.

+ <para>
+ With data warehouse type workloads it can make sense to use a larger
+ number of partitions than with an OLTP type workload. Generally, in data
+ warehouses, query planning time is less of a concern as the majority of

VAST majority? Or "essentially all"? Or " .. query planning time is
insignificant compared to the time spent during query execution.

+ processing time is spent during query execution. With either of these two
+ types of workload it is important to make the right decisions early as

early COMMA

Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thierry Husson 2019-06-08 09:06:39 Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Previous Message Fabien COELHO 2019-06-08 06:11:03 Re: Bloom Indexes - bit array length and the total number of bits (or hash functions ?? ) !