Quick Links

Loading 500m json files to database

From:	pinker <pinker(at)onet(dot)eu>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Loading 500m json files to database
Date:	2020-03-23 10:24:48
Message-ID:	1584959088557-0.post@n3.nabble.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi, do you have maybe idea how to make loading process faster?

I have 500 millions of json files (1 json per file) that I need to load to
db.
My test set is "only" 1 million files.

What I came up with now is:

time for i in datafiles/*; do
psql -c "\copy json_parts(json_data) FROM $i"&
done

which is the fastest so far. But it's not what i expect. Loading 1m of data
takes me ~3h so loading 500 times more is just unacceptable.

some facts:
* the target db is on cloud so there is no option to do tricks like turning
fsync off
* version postgres 11
* i can spin up huge postgres instance if necessary in terms of cpu/ram
* i tried already hash partitioning (to write to 10 different tables instead
of 1)

Any ideas?

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Responses

Re: Loading 500m json files to database at 2020-03-23 11:49:25 from Ertan Küçükoğlu
Re: Loading 500m json files to database at 2020-03-23 14:12:08 from Christopher Browne
Re: Loading 500m json files to database at 2020-03-23 15:16:40 from Rob Sargent
Re: Loading 500m json files to database at 2020-03-23 15:31:36 from Adrian Klaver
Re: Loading 500m json files to database at 2020-03-24 01:11:28 from David G. Johnston
Re: Loading 500m json files to database at 2020-03-24 04:14:58 from Reid Thompson

Browse pgsql-general by date

	From	Date	Subject
Next Message	Ravi Krishna	2020-03-23 11:12:14	Re: Postgres cluster setup
Previous Message	pabloa98	2020-03-23 05:37:35	Re: Could postgres12 support millions of sequences? (like 10 million)