Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation

From: Motog Plus <mplus7535(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation
Date: 2025-06-25 08:48:55
Message-ID: CAL5Gnit8qfT_10cADQnJxbCCWBXkzWqE_xbrUj7TLzeH+oXrog@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

Dear PostgreSQL Community,

We are implementing a new feature in our application that is expected to
generate a significant amount of data, and we are seeking your expert
guidance on how to best handle this growth within our existing PostgreSQL
setup.

Currently, our PostgreSQL instance runs on an EC2 c5.4xlarge Ubuntu
instance with the following specifications:

- *RAM:* 32 GB
- *Disk:* 1.2 TB
- *vCPUs:* 16

Our database architecture utilizes a primary-standby streaming replication
setup. Application modules (running in Kubernetes pods) connect to the
database through Pgpool-II, using HikariCP for connection pooling.

We have multiple databases on our primary server, with their approximate
current sizes as follows:

- *C:* 620 GB
- *M:* 225 GB
- *P:* 59 GB
- *K:* 13 MB

The total current size of our databases is around *1 TB*. With the new
feature, we anticipate a substantial increase in data, potentially reaching *10
TB* over the next 5-7 years.

Below is the table for current size and expected growth in size:

*S.No.*

*DB*

*Current DB size*

*Future DB size*

*Schema Name*

*Current Schema size*

*Future Schema size *

1

C

1 TB

8 TB - 10 TB

acc

297 GB

3 TB - 4 TB

po

270 GB

2.6 TB - 3.5 TB

pa

27 GB

270 GB

pra

13 GB

130 GB

fu

13 GB

130 GB

te

167 MB

2 GB

pro

30 MB

300 MB

2

M

225 GB

2.2 TB - 3 TB

bi

82 GB

820 GB

co

80 GB

800 GB

ps

17 GB

170 GB

qo

16 GB

160 GB

to

7 GB

70 GB

in

7 GB

70 GB

di

6 GB

60 GB

no

4 GB

40 GB

do

4 GB

40 GB

cl

3 GB

30 GB

3

P

60 GB

600 GB

au

45 GB

450 GB

fi

8 GB

80 GB

con

4 GB

40 GB

ba

1 GB

10 GB

li

2 MB

20 GB

We would greatly appreciate your insights on the following points:

1. *Scalability for Large Datasets:* Conceptually, PostgreSQL is known
to handle large datasets. However, we'd like to confirm if a single
PostgreSQL instance can realistically and efficiently manage 10-12 TB of
data in a production environment, considering typical transaction loads.
2. *Database Split Strategy:* Our largest database, "C," currently
occupies 620 GB. It contains multiple schemas. We are considering splitting
database "C" into two new databases: "C1" to exclusively house the "acc"
schema, and "C2" for the remaining schemas. Is this a recommended approach
for managing growth, and what are the potential pros and cons?
3. *Server Allocation for Split Databases:* If we proceed with splitting
"C" into "C1" and "C2," would it be advisable to assign a new, separate
database server for "C2," or could both "C1" and "C2" reside on the same
database server? What factors should we consider in making this decision?
4. *Performance Limits per Database and Database Server:* From a
performance perspective, is there a general "limit" or best practice for
the maximum amount of data a single database server should handle (e.g., 10
TB) and similarly general limit per database? How does this influence the
decision to add more database servers?
5. *Best Practices for Large-Scale Data Management:* Beyond standard
practices like indexing and partitioning, what other best practices should
we consider implementing to ensure optimal performance and manageability
with such a large dataset? This could include configurations, maintenance
strategies, etc.
6. *Hardware Configuration Recommendations:* Based on our projected data
growth and desired performance, what hardware configurations (e.g., RAM,
CPU, storage I/O, storage type like NVMe) would you recommend for future
database servers to efficiently handle 10-12 TB?
7. *Open-Source Horizontal Scaling Solutions:* Are there any open-source
horizontal scaling solutions for PostgreSQL (other than Citus Data) that
the community recommends or has experience with for managing extremely
large datasets? Any pointers or guidance on this would be highly valuable.

Thank you in advance for your time and expertise. We look forward to your
valuable insights.

Thanks & Regards,

Ramzy

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message raphi 2025-06-25 14:55:07 pgstat_snap - create adhoc pg_stat_statements snapshots
Previous Message Motog Plus 2025-06-25 08:35:25 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation

Browse pgsql-performance by date

  From Date Subject
Next Message James Pang 2025-06-25 09:10:55 Re: many sessions waiting DataFileRead and extend
Previous Message Motog Plus 2025-06-25 08:35:25 Guidance Needed: Scaling PostgreSQL for 12 TB Data Growth - New Feature Implementation