Re: Insert large number of records

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Job <Job(at)colliniconsulting(dot)it>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Insert large number of records
Date: 2017-09-19 16:09:33
Message-ID: D653223D-1492-4F6C-AB54-689F4EF7E4C8@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> On 19 Sep 2017, at 15:47, Job <Job(at)colliniconsulting(dot)it> wrote:
>
> Hi guys,
>
> we need to insert from a table to another (Postgresql 9.6.1) a large amount of data (about 10/20 millions of rows) without locking destination table.
> Pg_bulkload is the fastest way but it locks the table.
>
> Are there other ways?
> Classic "COPY" from?

We do something like that using a staging table to load to initially (although not bulk; data arrives in our staging table with batches of 5k to 100k rows) and then we transfer the data using insert/select and "on conflict do".

That data-transfer within PG takes a couple of minutes on our rather limited VM for a wide 37M rows table (~37GB on disk). That only locks the staging table (during initial bulkload) and the rows in the master table that are currently being altered (during the insert/select).

If your data-source is a file in a format supported by COPY, then you can use COPY to do the initial bulk load into the staging table.

Some benefits of this 2-stage approach are that it leaves room to manipulate the data (type conversions, for example) and that it can handle the scenario where a matching target record in the master table already exists. In our case, we convert character fields to varchar (which saves a lot of space(s)).

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message chiru r 2017-09-19 16:57:04 USER Profiles for PostgreSQL
Previous Message Yason TR 2017-09-19 14:53:28 Re: JDBC: logical replication and LSN feedback