From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | "sivapostgres(at)yahoo(dot)com" <sivapostgres(at)yahoo(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
Cc: | Pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Is there any limit on the number of rows to import using copy command |
Date: | 2025-07-25 00:48:53 |
Message-ID: | e8fb9b99-40a2-41eb-8932-0959db8356c6@aklaver.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 7/24/25 16:59, sivapostgres(at)yahoo(dot)com wrote:
> 1. Testcase. Created a new database, modified the triggers (split into
> three), populated required master data, lookup tables. Then transferred
> 86420 records. Checked whether all the 86420 records inserted in table1
> and also whether the trigger created the required records in table2.
> Yes, it created.
>
> 2. In the test case above, the total time taken to insert 86420 records
> is 1.15 min only. Earlier (before splitting the triggers) we waited
> for more than 1.5 hrs first time and 2.5 hrs second time with no records
> inserted.
>
> 3. Regarding moving the logic to procedure. Won't the trigger work?
> Will it be a burden for 86420 records? It's working, if we insert few
> thousand records. After split of trigger function, it's working for
> 86420 records. Are triggers overhead for handling even 100000 records?
> In production system, the same (single) trigger is working with 3
> millions of records. There might be better alternatives to triggers,
> but triggers should also work. IMHO.
Reread this post, in the thread, from Laurenz Albe:
https://www.postgresql.org/message-id/de08fd016dd9c630f65c52b80292550e0bcdea4c.camel%40cybertec.at
>
> 4. Staging tables. Yes, I have done that in another case, where there
> was a need to add data / transform for few more columns. It worked like
> a charm. In this case, since there was no need for any other
> calculations (transformation), and with just column to column matching,
> I thought copy command will do.
There is a transformation, you are moving data to another table. That is
overhead, especially if the triggers are not optimized.
>
> Before splitting the trigger into three, we tried
> 1. Transferring data using DataWindow / PowerBuilder (that's the tool
> we use to develop our front end). With the same single trigger, it took
> few hours (more than 4 hours, exact time not noted down) to transfer the
> same 86420 records. (Datawindow fires insert statements for every
> row). Works, but the time taken is not acceptable.
INSERTs by row is going to be slow, especially if the tool is doing a
commit for each which I suspect it is. Check the Postgres logs.
>
> 2. Next, we split the larger csv file into 8, with each file containing
> 10,000 records and the last one with 16420 records. Copy command
> worked. Works, but the time taken to split the file not acceptable. We
> wrote a batch file to split the larger csv file. We felt batch file is
> easier to automate the whole process using PowerBuilder.
I find most GUI tools create extra steps and overhead. My preference are
simpler tools e.g. using Python csv module to batch/stream rows that the
Python psycopg2 Postgres driver can insert or copy into the database.
See:
https://www.psycopg.org/psycopg3/docs/basic/copy.html
>
> 3. What we observed here, is insert statement succeeds and copy command
> fails, if the records exceed a certain no. Haven't arrived the exact
> number of rows when the copy command fails.
>
> Will do further works after my return from a holiday.
>
> Happiness Always
> BKR Sivaprakash
>
>
>
> On Thursday 24 July, 2025 at 08:18:07 pm IST, Adrian Klaver
> <adrian(dot)klaver(at)aklaver(dot)com> wrote:
>
>
> On 7/24/25 05:18, sivapostgres(at)yahoo(dot)com <mailto:sivapostgres(at)yahoo(dot)com>
> wrote:
> > Thanks Merlin, adrain, Laurenz
> >
> > As a testcase, I split the trigger function into three, one each for
> > insert, update, delete, each called from a separate trigger.
> >
> > IT WORKS!.
>
> It worked before, it just slowed down as your cases got bigger. You need
> to provide more information on what test case you used and how you
> define worked.
>
> >
> > Shouldn't we have one trigger function for all the three trigger
> > events? Is it prohibited for bulk insert like this?
>
> No. Triggers are overhead and they add to the processing that need to be
> done for moving the data into the table. Whether that is an issue is a
> case by case determination.
>
> >
> > I tried this in PGAdmin only, will complete the testing from the program
> > which we are developing, after my return from holiday.
>
> From Merlin Moncure's post:
>
> "* reconfiguring your logic to a procedure can be a better idea; COPY
> your data into some staging tables (perhaps temp, and indexed), then
> write to various tables with joins, upserts, etc."
>
> I would suggest looking into implementing the above.
>
>
> >
> > Happiness Always
> > BKR Sivaprakash
>
> >
>
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Pierre Barre | 2025-07-25 09:25:48 | Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance |
Previous Message | Ron Johnson | 2025-07-25 00:41:34 | Re: Is there any limit on the number of rows to import using copy command |