Quick Links

Re: Loading 500m json files to database

From:	Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To:	pinker <pinker(at)onet(dot)eu>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Loading 500m json files to database
Date:	2020-03-23 15:31:36
Message-ID:	bfe3f14a-37b7-2047-20bf-8ece013ae470@aklaver.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 3/23/20 3:24 AM, pinker wrote:
> Hi, do you have maybe idea how to make loading process faster?
>
> I have 500 millions of json files (1 json per file) that I need to load to
> db.
> My test set is "only" 1 million files.
>
> What I came up with now is:
>
> time for i in datafiles/*; do
> psql -c "\copy json_parts(json_data) FROM $i"&
> done
>
> which is the fastest so far. But it's not what i expect. Loading 1m of data
> takes me ~3h so loading 500 times more is just unacceptable.

Aggregating the JSON files as others have suggested would help greatly.

Knowing what is happening in json_parts() might help folks provide
further tips.

>
> some facts:
> * the target db is on cloud so there is no option to do tricks like turning
> fsync off
> * version postgres 11
> * i can spin up huge postgres instance if necessary in terms of cpu/ram
> * i tried already hash partitioning (to write to 10 different tables instead
> of 1)
>
>
> Any ideas?
>
>
>
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Loading 500m json files to database at 2020-03-23 10:24:48 from pinker

Responses

Re: Loading 500m json files to database at 2020-03-24 00:26:33 from pinker

Browse pgsql-general by date

	From	Date	Subject
Next Message	Michael Lewis	2020-03-23 15:55:37	Re: Runtime partition pruning
Previous Message	Julien Rouhaud	2020-03-23 15:22:47	Re: PG12 autovac issues