Re: How to load data from CSV into a table that has array types in its columns?

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Siddharth Jain <siddhsql(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: How to load data from CSV into a table that has array types in its columns?
Date: 2022-10-27 00:59:49
Message-ID: 8830ee61-cf09-7ba6-0055-efe7a910ac59@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/26/22 17:30, Siddharth Jain wrote:
> Hello,
>
> Given a long list like this:
>
> 1,2
> 2,4 --> 2 appears once
> 7,9
> 8,9
> 5,3
> 2,5 --> note 2 appears twice
>
> I want to load it into this table:
>
> create table tbl (
>   id integer primary key,
>   fids integer[]
> )
>
> so we will have 2 -> [4,5] where 2 is id and [4,5] are the fids
>
> My actual dataset is very large and has 100M rows in it. How can it be
> efficiently loaded into postgres?
>
> I think I have to use a program for this and am trying to use the pg
> library that comes with Node.js. I am reading the data in batches of 1M
> or 100k rows for example. I have created a dictionary in Node.js where I
> am storing the hashmap. The part where I am stuck is how to generate the
> SQL command?
>
> 1. I want to make one call to the server for the batch of 1M rows, not
> 1M calls
> 2. Some keys in the hashmap might already exist in the database. For
> these keys we want to append to the array
> 3. Some keys will not exist and for these we want to insert new rows
>
> Can someone help me please? I am using Postgres for the first time.

The best way to deal with importing large batches of data is to use COPY

https://www.postgresql.org/docs/current/sql-copy.html

But that will not play well with modifying the data as you input it.

What I can see doing is:

1) COPY the data into a staging table:

create staging_tbl (id integer, fid integer)

2) Then using SQL statements to move the data to the final table.

As example of one possibility, using ON CONFLICT from here:

https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

insert into table tbl select id, array[fid] from staging_table on
conflict(id) DO UPDATE SET fids = array_append(fids, excluded.fid);

I would test with a smaller example data set to vetify.

>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bryn Llewellyn 2022-10-27 01:33:24 Seeking the correct term of art for the (unique) role that is usually called "postgres"—and the mental model that underlies it all
Previous Message Siddharth Jain 2022-10-27 00:30:00 How to load data from CSV into a table that has array types in its columns?