R: Insert large number of records

From: Job <Job(at)colliniconsulting(dot)it>
To: Alban Hertroys <haramrae(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: R: Insert large number of records
Date: 2017-09-20 20:55:39
Message-ID: 88EF58F000EC4B4684700C2AA3A73D7A08180ABD212E@W2008DC01.ColliniConsulting.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> Even better would be if your bulkload could already be organised such
> that all the data in the "temporary" table can indiscriminately be
> inserted into the same target partition. That though depends a bit on
> your setup - at some point the time saved at one end gets consumed on
> the other or it takes even longer there.

Thank your for the answers and the ideas, really!

We wrote a simple script that split datas directly into the right partition, avoidind any trigger.
We also split into 100k-record portions.

Now performances have really improved, thanks to everybody!

One further question: within a query launched on the MASTER table where i need to scan every table, for exaple to search rows locatd in more partitions.
In there a way to improve "parallel scans" between more table at the same time or not?
I noticed, with explain analyze, the scan in the master table is Always sequential, descending into the partitions.

Thank you again,
F

________________________________________
Da: Alban Hertroys [haramrae(at)gmail(dot)com]
Inviato: mercoledì 20 settembre 2017 17.50
A: Job
Cc: pgsql-general(at)postgresql(dot)org
Oggetto: Re: [GENERAL] Insert large number of records

On 20 September 2017 at 07:42, Job <Job(at)colliniconsulting(dot)it> wrote:
> We use a "temporary" table, populated by pg_bulkload - it takes few minutes in this first step.
> Then, from the temporary table, datas are transferred by a trigger that copy the record into the production table.
> But *this step* takes really lots of time (sometimes also few hours).
> There are about 10 millions of record.

Perhaps the problem isn't entirely on the writing end of the process.

How often does this trigger fire? Once per row inserted into the
"temporary" table, once per statement or only after the bulkload has
finished?

Do you have appropriate indices on the temporary table to guarantee
quick lookup of the records that need to be copied to the target
table(s)?

> We cannot use pg_bulkload to load directly data into production table since pg_bulkload would lock the Whole table, and "COPY" command is slow and would not care about table partitioning (COPY command fire partitioned-table triggers).

As David already said, inserting directly into the appropriate
partition is certainly going to be faster. It removes a check on your
partitioning conditions from the query execution plan; if you have
many partitions, that adds up, because the database needs to check
that condition among all your partitions for every row.

Come to think of it, I was assuming that the DB would stop checking
other partitions once it found a suitable candidate, but now I'm not
so sure it would. There may be good reasons not to stop, for example
if we can partition further into sub-partitions. Anybody?

Since you're already using a trigger, it would probably be more
efficient to query your "temporary" table for batches belonging to the
same partition and insert those into the partition directly, one
partition at a time.

Even better would be if your bulkload could already be organised such
that all the data in the "temporary" table can indiscriminately be
inserted into the same target partition. That though depends a bit on
your setup - at some point the time saved at one end gets consumed on
the other or it takes even longer there.

Well, I think I've thrown enough ideas around for now ;)

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jerry Sievers 2017-09-20 21:03:54 Re: Any known issues Pg 9.3 on Ubuntu Xenial kernel 4.4.0?
Previous Message Rob Sargent 2017-09-20 20:54:32 Re: Inserting millions of record in a partitioned Table