Re: suggestion about time based partitioning and hibernate

From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: suggestion about time based partitioning and hibernate
Date: 2023-07-18 06:47:58
Message-ID: 92ccfffe-8bcf-249e-82c1-a0ae9cebae41@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 7/18/23 01:18, Luca Ferrari wrote:
> Dear all,
> I'm looking for ideas here, and it could be someone already stepped
> into declarative partitioning of an existing database where Hibernate
> (a Java ORM) handles the tables.
> The situation is as follows:
>
> create table foo( id primary key, a_date date, ... );
>
> Now, the trivial way to partition this would be on a range based on
> a_date, so that the primary key of the tables shifts from id to (id,
> a_date). One thing that frightens me is that Hibernate does a lot of
> per-row lookups by means of the id, so while the partitioning is
> probably going to make things more manageable and even faster in some
> scenarios, could lead to drawbacks when Hibernate queries by id.
> Moreover, hibernate will think id is unique while it is not anymore.
> Last but not least, referencing foreign keys are made by Hibernate
> thru the id column, and it means that incoming foreign keys to foo
> will not be in place anymore.
>
> Now, I know that I can define a composite key in hibernate, in order
> to match the effective new data structure, but this requires a huge
> rewrite of the application code.
> And after all, we are talking about a non-PostgreSQL related piece, so
> the problem is not on the PostgreSQL side.
>
> Anyone has already done a partitioning in such scenario?
>
> I am thinking that partitioning on an hash of id could be the only way
> to go without having to touch the hibernate side, even if this would
> bring up a less balanced partitioned structure. In such case, I mean
> partitioning by hash, having a table with 60 millions rows per 50 GB
> in size, what would be the rule of thumb to select the number of
> partitions (i.e., a suggested modulus)?

We're in the exact same situation, using Hibernate and having many tables
like foo, partitioned by a_date with PK of (id, a_date) and FK definitions
(id, a_date).

It was a *massive mistake*, since many queries span partitions.  Within a
year, I "departitioned" all tables except two giant tables that have large
bytea columns.  (All the formerly partitioned tables still have (id, a_date)
as PK and FK.  I'd like to change that, but the amount of code change is
untennable given them amount of new features that need to be added.)

Thus, my recommendations are:
1. only partition the tables you /must/, and
2. partition by id.

--
Born in Arizona, moved to Babylonia.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2023-07-18 07:09:23 RE: Support logical replication of DDLs
Previous Message Luca Ferrari 2023-07-18 06:18:01 suggestion about time based partitioning and hibernate