| From: | pinker <pinker(at)onet(dot)eu> | 
|---|---|
| To: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Loading 500m json files to database | 
| Date: | 2020-03-23 10:24:48 | 
| Message-ID: | 1584959088557-0.post@n3.nabble.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Hi, do you have maybe idea how to make loading process faster?
I have 500 millions of json files (1 json per file) that I need to load to
db.
My test set is "only" 1 million files.
What I came up with now is:
time for i in datafiles/*; do
  psql -c "\copy json_parts(json_data) FROM $i"&
done
which is the fastest so far. But it's not what i expect. Loading 1m of data
takes me ~3h so loading 500 times more is just unacceptable.
some facts:
* the target db is on cloud so there is no option to do tricks like turning
fsync off
* version postgres 11
* i can spin up huge postgres instance if necessary in terms of cpu/ram
* i tried already hash partitioning (to write to 10 different tables instead
of 1)
Any ideas?
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ravi Krishna | 2020-03-23 11:12:14 | Re: Postgres cluster setup | 
| Previous Message | pabloa98 | 2020-03-23 05:37:35 | Re: Could postgres12 support millions of sequences? (like 10 million) |