Re: Define hash partition for certain column values

From: Голубева Яна <ishsha(at)yandex(dot)ru>
To: Michael Lewis <mlewis(at)entrata(dot)com>, Alban Hertroys <haramrae(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Define hash partition for certain column values
Date: 2021-01-13 13:30:34
Message-ID: 2664831610544448@mail.yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

<div><div><div>Sorry, I think I 've described my case not precisely enough.</div><div> </div><div>"Randomly" is not pure random in my case.</div><div> </div><div>My solution is planning to be used on different servers with different DBs. The initial data of the base table depends on the DB. But I know that the key value of the new rows is increasing. Not monotonously, but still.</div><div>I need a common solution for all DBs.</div><div> </div><div>The size of a base table could be very different (from millions to hundreds of billions). For tests I've used 2 different dumps.</div><div> </div><div>Ranges that were suitable for the first dump show for the second the situation like I've described (2-3 partitions with 95% of data). And vice versa.</div><div>Besides, permanent increasing of key value of new rows means that some ranges will be permanently increasing </div><div>meanwhile others will have the same amount of data or even less (outdated data is clearing).</div><div> </div><div>Hash partitioning shows that I will have partitions with not exactly the same size of data but similar enough. And this result is actual for both dumps.</div><div> </div><div>So that I've decided to use hash partitioning.</div><div> </div><div>Thank you,</div><div>Iana Golubeva</div></div></div><div><br /></div><div><br /></div><div>12.01.2021, 19:41, "Michael Lewis" &lt;mlewis(at)entrata(dot)com&gt;:</div><blockquote><div dir="ltr"><div dir="ltr">On Tue, Jan 12, 2021 at 9:37 AM Alban Hertroys &lt;<a href="mailto:haramrae(at)gmail(dot)com">haramrae(at)gmail(dot)com</a>&gt; wrote:<br /></div><div class="f13ca48719c8a60033905b23b39675agmail_quote"><blockquote class="f13ca48719c8a60033905b23b39675agmail_quote" style="border-left-color:rgb( 204 , 204 , 204 );border-left-style:solid;border-left-width:1px;margin:0px 0px 0px 0.8ex;padding-left:1ex"><br />
&gt; On 12 Jan 2021, at 16:51, Голубева Яна &lt;<a href="mailto:ishsha(at)yandex(dot)ru" target="_blank">ishsha(at)yandex(dot)ru</a>&gt; wrote:<br />
&gt; <br />
&gt; Values for the key partitioning column are generated randomly and I can't predict their distribution between ranges.<br />
&gt; If I just create some ranges I won't have any guarantee that partitions will have similar amount of data. It is possible that I will have 2 or 3 extremely big partitions and a bit of data in others.<br />
<br />
A hash of a random number is also random, so when using hashes for partitioning you will get the same problem.<br />
<br />
If you want to distribute values equally over a fixed number of partitions, I suggest you partition on a modulo of a monotonously increasing number (a sequence for example), instead of relying on a random number.<br />
<br />
&gt; 12.01.2021, 17:55, "Michael Lewis" &lt;<a href="mailto:mlewis(at)entrata(dot)com" target="_blank">mlewis(at)entrata(dot)com</a>&gt;:<br />
&gt; On Tue, Jan 12, 2021 at 1:21 AM Голубева Яна &lt;<a href="mailto:ishsha(at)yandex(dot)ru" target="_blank">ishsha(at)yandex(dot)ru</a>&gt; wrote:<br />
&gt; List or range partitioning isn't suitable for my case.<br />
&gt; I am using a column of numeric(20) type as a base for partitioning. The values of the column are generated randomly. <br />
&gt; So there will be too many partitions if I use list partitioning as is.<br />
&gt; <br />
&gt; Sorry, but why is range not suited for this? It would seem fairly trivial to create 50 or 1000 partitions to break up the range of values allowed by your field definition.<br />
<br />
Alban Hertroys</blockquote><div><br /></div><div>That said, there is no reason you should need near-perfectly-even distribution anyway. You can also split partitions later, or do another level of partitioning on large partitions if they somehow end up significantly unbalanced.</div><div><br /></div><div>How many rows are we talking about initially/over time? Do you plan to drop old data at all? Perhaps the initial decision to partition was decided on a bit too hastily.</div></div></div>
</blockquote>

Attachment Content-Type Size
unknown_filename text/html 3.9 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Day, David 2021-01-13 18:42:04 Re: Views and triggers more then one row returned by subquery.
Previous Message Laurenz Albe 2021-01-13 11:28:19 Re: System (and DB) upgrade problem.