Re: Slow Bulk Delete

From: Bob Lunney <bob_lunney(at)yahoo(dot)com>
To: pgsql-performance(at)postgresql(dot)org, thilo <thilo(dot)tanner(at)gmail(dot)com>
Subject: Re: Slow Bulk Delete
Date: 2010-05-13 03:13:42
Message-ID: 821768.44263.qm@web39705.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thilo,

Just a few of thoughts off the top of my head:

1. If you know the ids of the rows you want to delete beforhand, insert them in a table, then run the delete based on a join with this table.

2. Better yet, insert the ids into a table using COPY, then use a join to create a new table with the rows you want to keep from the first table. Drop the original source table, truncate the id table, rename the copied table and add indexes and constraints.

3. See if you can partition the table somehow so the rows you want to delete are in a single partitioned child table. When its time to delete them just drop the child table.

Of course, if the 1M rows you need to delete is very small compared to the total overall size of the original table the first two techniques might now buy you anything, but its worth a try.

Good luck!

Bob Lunney

--- On Sat, 5/8/10, thilo <thilo(dot)tanner(at)gmail(dot)com> wrote:

> From: thilo <thilo(dot)tanner(at)gmail(dot)com>
> Subject: [PERFORM] Slow Bulk Delete
> To: pgsql-performance(at)postgresql(dot)org
> Date: Saturday, May 8, 2010, 7:39 AM
> Hi all!
>
> We moved from MySQL to Postgresql for some of our projects.
> So far
> we're very impressed with the performance (especially
> INSERTs and
> UPDATEs), except for a strange problem with the following
> bulk delete
> query:
>
> DELETE FROM table1 WHERE table2_id = ?
>
> I went through these Wiki pages, trying to solve the
> problem:
>
> http://wiki.postgresql.org/wiki/SlowQueryQuestions and
> http://wiki.postgresql.org/wiki/Performance_Optimization
>
> but unfortunately without much luck.
>
> Our application is doing batch jobs. On every batch run, we
> must
> delete approx. 1M rows in table1 and recreate these
> entries. The
> inserts are very fast, but deletes are not. We cannot make
> updates,
> because there's no identifying property in the objects of
> table1.
>
> This is what EXPLAIN is telling me:
>
> EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id =
> 11242939
>                
>                
>                
>          QUERY
> PLAN
> ----------------------------------------------------------------------------------------------------------------------------
> Index Scan using sr_index on table1  (cost=0.00..8.56
> rows=4 width=6)
> (actual time=0.111..0.154 rows=4 loops=1)
>    Index Cond: (table2_id = 11242939)
> Total runtime: 0.421 ms
> (3 rows)
>
> This seems to be very fast (using the index), but running
> this query
> from JDBC takes up to 20ms each. For 1M rows this sum up to
> several
> hours. When I have a look at pg_top psql uses most of the
> time for the
> deletes. CPU usage is 100% (for the core used by
> postgresql). So it
> seems that postgresql is doing some sequential scanning or
> constraint
> checks.
>
> This is the table structure:
>
> id   
> bigint     (primary key)
> table2_id   
> bigint     (foreign key constraint
> to table 2, *indexed*)
> table3_id   
> bigint     (foreign key constraint
> to table 3, *indexed*)
> some non-referenced text and boolean fields
>
> My server settings (Potgresql 8.4.2):
>
> shared_buffers = 1024MB
> effective_cache_size = 2048MB
> work_mem = 128MB
> wal_buffers = 64MB
> checkpoint_segments = 32
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.9
>
> It would be very nice to give me a hint to solve the
> problem. It
> drives me crazy ;-)
>
> If you need more details please feel free to ask!
>
> Thanks in advance for your help!
>
> Kind regards
>
> Thilo
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message venu madhav 2010-05-13 04:41:46 Re: Performance issues when the number of records are around 10 Million
Previous Message Craig James 2010-05-12 14:08:20 Re: Performance issues when the number of records are around 10 Million