Quick Links

Re: Loading 500m json files to database

From:	"Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: Loading 500m json files to database
Date:	2020-03-24 09:27:34
Message-ID:	20200324092734.GA14300@hjp.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 2020-03-23 17:18:45 -0700, pinker wrote:
> Christopher Browne-3 wrote
> > Well, you're paying for a lot of overhead in that, as you're
> > establishing a psql command, connecting to a database, spawning a
> > backend process, starting a transactions, committing a transaction,
> > closing the backend process, disconnecting from the database, and
> > cleaning up after the launching of the psql command. And you're
> > doing that 500 million times.
> >
> > The one thing I left off that was the loading of a single tuple into
> > json_parts.
[...]
> > Reducing that overhead is the single most important thing you can do.
>
> Yes, I was thinking about that but no idea now how to do it right now.

Do you know any programming language (Python, Perl, ...)? You'll
probably get a huge gain from writing a script that just opens the
connection once and then inserts each file.

Copy usually is even faster by a fair amount, but since you have to read
the data for each row from a different file (and - if I understood you
correctly, a remote one at that), the additional speedup is probably not
that great in this case.

Splitting the work int batches and executing several batches in parallel
probably helps.

In response to

Re: Loading 500m json files to database at 2020-03-24 00:18:45 from pinker

Browse pgsql-general by date

	From	Date	Subject
Next Message	Remund Alain	2020-03-24 15:59:46	avoid WAL for refresh of materialized view
Previous Message	Michael Paquier	2020-03-24 06:12:38	Re: PG12 autovac issues