Quick Links

Re: suggestion about time based partitioning and hibernate

From:	Ron <ronljohnsonjr(at)gmail(dot)com>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: suggestion about time based partitioning and hibernate
Date:	2023-07-18 06:47:58
Message-ID:	92ccfffe-8bcf-249e-82c1-a0ae9cebae41@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 7/18/23 01:18, Luca Ferrari wrote:
> Dear all,
> I'm looking for ideas here, and it could be someone already stepped
> into declarative partitioning of an existing database where Hibernate
> (a Java ORM) handles the tables.
> The situation is as follows:
>
> create table foo( id primary key, a_date date, ... );
>
> Now, the trivial way to partition this would be on a range based on
> a_date, so that the primary key of the tables shifts from id to (id,
> a_date). One thing that frightens me is that Hibernate does a lot of
> per-row lookups by means of the id, so while the partitioning is
> probably going to make things more manageable and even faster in some
> scenarios, could lead to drawbacks when Hibernate queries by id.
> Moreover, hibernate will think id is unique while it is not anymore.
> Last but not least, referencing foreign keys are made by Hibernate
> thru the id column, and it means that incoming foreign keys to foo
> will not be in place anymore.
>
> Now, I know that I can define a composite key in hibernate, in order
> to match the effective new data structure, but this requires a huge
> rewrite of the application code.
> And after all, we are talking about a non-PostgreSQL related piece, so
> the problem is not on the PostgreSQL side.
>
> Anyone has already done a partitioning in such scenario?
>
> I am thinking that partitioning on an hash of id could be the only way
> to go without having to touch the hibernate side, even if this would
> bring up a less balanced partitioned structure. In such case, I mean
> partitioning by hash, having a table with 60 millions rows per 50 GB
> in size, what would be the rule of thumb to select the number of
> partitions (i.e., a suggested modulus)?

We're in the exact same situation, using Hibernate and having many tables
like foo, partitioned by a_date with PK of (id, a_date) and FK definitions
(id, a_date).

It was a *massive mistake*, since many queries span partitions. Within a
year, I "departitioned" all tables except two giant tables that have large
bytea columns. (All the formerly partitioned tables still have (id, a_date)
as PK and FK. I'd like to change that, but the amount of code change is
untennable given them amount of new features that need to be added.)

Thus, my recommendations are:
1. only partition the tables you /must/, and
2. partition by id.

--
Born in Arizona, moved to Babylonia.

In response to

suggestion about time based partitioning and hibernate at 2023-07-18 06:18:01 from Luca Ferrari

Responses

RE: suggestion about time based partitioning and hibernate at 2023-07-18 07:32:19 from n.kobzarev

Browse pgsql-general by date

	From	Date	Subject
Next Message	Zhijie Hou (Fujitsu)	2023-07-18 07:09:23	RE: Support logical replication of DDLs
Previous Message	Luca Ferrari	2023-07-18 06:18:01	suggestion about time based partitioning and hibernate