Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL

From: Andy Hartman <hartman60home(at)gmail(dot)com>
To: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
Cc: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL
Date: 2025-05-30 16:14:52
Message-ID: CAEZv3cpESEGDUu-W5WSDo=LqORjk122YR7UOEdui6ujpTU-eAQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

what was the duration start to finish of the migration of the 6tb of data.
then what do you use for a quick backup after archived PG data

Thanks.

On Fri, May 30, 2025 at 11:29 AM Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
wrote:

> On Fri, May 30, 2025 at 3:51 AM Motog Plus <mplus7535(at)gmail(dot)com> wrote:
>
>> Hi Team,
>>
>> We are currently planning a data archival initiative for our production
>> PostgreSQL databases and would appreciate suggestions or insights from the
>> community regarding best practices and proven approaches.
>>
>> **Scenario:**
>> - We have a few large tables (several hundred million rows) where we want
>> to archive historical data (e.g., older than 1 year).
>> - The archived data should be moved to a separate PostgreSQL database (on
>> a same or different server).
>> - Our goals are: efficient data movement, minimal downtime, and safe
>> deletion from the source after successful archival.
>>
>> - PostgreSQL version: 15.12
>> - Both source and target databases are PostgreSQL.
>>
>> We explored using `COPY TO` and `COPY FROM` with CSV files, uploaded to a
>> SharePoint or similar storage system. However, our infrastructure team
>> raised concerns around the computational load of large CSV processing and
>> potential security implications with file transfers.
>>
>> We’d like to understand:
>> - What approaches have worked well for you in practice?
>>
>
> This is how I migrated 6TB of data from an Oracle database to Postgresql,
> and then implemented quarterly archiving of the PG database:
> - COPY FROM (SELECT * FROM live_table WHERE date_fld in
> some_manageable_date_range) TO STDOUT.
> - Compress
> - scp
> - COPY TO archive_table.
> - Index
> - DELETE FROM live_table WHERE date_fld in some_manageable_date_range
> (This I only did in the PG archive process
>
> (Naturally, the Oracle migration used Oracle-specific commands.)
>
> - Are there specific tools or strategies you’d recommend for ongoing
>> archival?
>>
>
> I write generic bash loops to which you pass an array that contains the
> table name, PK, date column and date range.
>
> Given a list of tables, it did the COPY FROM, lz4 and scp. Once that
> finished successfully, another script dropped archive indices on the
> current table, COPY TO and CREATE INDEX statements. A third script did the
> deletes.
>
> This works even when the live database tables are all connected via FK.
> You just need to carefully order the tables in your script.
>
>
>> - Any performance or consistency issues we should watch out for?
>>
>
> My rules for scripting are "bite-sized pieces" and "check those return
> codes!".
>
>
>> Your insights or any relevant documentation/pointers would be immensely
>> helpful.
>>
>
> Index support uber alles. When deleting from a table which relies on a
> foreign key link to a table which _does_ have a date field, don't hesitate
> to join on that table.
>
> And DELETE of bite-sized chunks is faster than people give it credit for.
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Ron Johnson 2025-05-30 18:10:25 Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL
Previous Message Ron Johnson 2025-05-30 15:29:34 Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL