Quick Links

Re: Postgresql RDS DB Latency Chossing Hash join Plan

From:	github kran <githubkran(at)gmail(dot)com>
To:	Michael Lewis <mlewis(at)entrata(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Postgresql RDS DB Latency Chossing Hash join Plan
Date:	2019-11-14 17:21:47
Message-ID:	CACaZr5Qgb4AR4gMCF0dBTqFHe2zvH7mA3orXcRMjwQCuWmOUhQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-sql

Hello Team,

Hope everyone is doing great !!.

*Background*

We use PostgreSQL Version 10.6 version and heavily use PostgreSQL for our
day to day activities to write and read data. We have 2 clusters running
PostgreSQL engine , one cluster

keeps data up to 60 days and another cluster retains data beyond 1 year.
The data is partitioned close to a week( ~evry 5 days a partition) and we
have around 5 partitions per month per each table and we have 2 tables
primarily so that will be 10 tables a week. So in the cluster-1 we have
around 20 partitions and in cluster-2 we have around 160 partitions ( data
from 2018). We also want to keep the data for up to 2 years in the
cluster-2 to serve the data needs of the customer and so far we reached
upto 1 year of maintaining this data.

*Current activity*

We have a custom weekly migration DB script job that moves data from 1
cluster to another cluster what it does is the below things.

1) COPY command to copy the data from cluster-1 and split that data into
binary files

2) Writing the binary data into the cluster-2 table

3) Creating indexes after the data is copied.

*Problem what we have right now. *

When the migration activity runs(weekly) from past 2 times , we saw the
cluster read replica instance has restarted as it fallen behind the
master(writer instance). Everything

after that worked seamlessly but we want to avoid the replica getting
restarted. To avoid from restart we started doing smaller binary files and
copy those files to the cluster-2

instead of writing 1 big file of 450 million records. We were successful in
the recent migration as the reader instance didn’t restart after we split 1
big file into multiple files to copy the data over but did restart after
the indexes are created on the new table as it could be write intensive.

*DB parameters set on migration job*

work_mem set to 8 GB and maintenace_work_mem=32 GB.

Indexes per table = 3

total indexes for 2 tables = 5

*DB size*

Cluster-2 = 8.6 TB

Cluster-1 = 3.6 TB

Peak Table relational rows = 400 - 480 million rows

Average table relational rows = 300 - 350 million rows.

Per table size = 90 -95 GB , per table index size is about 45 GB

*Questions*

1) Can we decrease the maintenace_work_mem to 16 GB and will it slow down
the writes to the cluster , with that the reader instance can sync the data
slowly ?.

2) Based on the above use case what are your recommendations to keep the
data longer up to 2 years ?

3) What other recommendations you recommend ?.

Appreciate your replies.

THanks
githubkran

In response to

Re: Postgresql RDS DB Latency Chossing Hash join Plan at 2019-02-19 17:11:33 from Michael Lewis

Responses

PostGreSQL Replication and question on maintenance at 2019-11-14 17:23:24 from github kran

Browse pgsql-general by date

	From	Date	Subject
Next Message	github kran	2019-11-14 17:23:24	PostGreSQL Replication and question on maintenance
Previous Message	stan	2019-11-14 16:32:36	Re: Problems modifyiong view

Browse pgsql-sql by date

	From	Date	Subject
Next Message	github kran	2019-11-14 17:23:24	PostGreSQL Replication and question on maintenance
Previous Message	legrand legrand	2019-11-13 18:11:49	Re: install problem with extensions