Re: Large insert and delete batches

From: Kääriäinen Anssi <anssi(dot)kaariainen(at)thl(dot)fi>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Large insert and delete batches
Date: 2012-03-01 19:06:40
Message-ID: BC19EF15D84DC143A22D6A8F2590F0A78A9846AA67@EXMAIL.stakes.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Quoting myself:
"""
So, is there some common wisdom about the batch sizes? Or is it better
to do the inserts and deletes in just one batch? I think the case for
performance problems needs to be strong before default limits are
considered for PostgreSQL.
"""

I did a little test about this. My test was to see if there is any interesting difference
in performance between doing queries in small batches vs doing them in one go.

The test setup is simple: one table with an integer primary key containing a million rows.
The queries are "select * from the_table where id = ANY(ARRAY[list_of_numbers])"
and the similar delete, too.

For any sane amount of numbers in the list, the result is that doing the queries in smaller
batches might be a little faster, but nothing conclusive found. However, once you go into
millions of items in the list, the query will OOM my Postgres server. With million items
in the list the process uses around 700MB of memory, 2 million items is 1.4GB, and beyond
that it is an OOM condition. The problem seems to be the array which takes all the memory.
So, you can OOM the server by doing "SELECT ARRAY[large_enough_list_of_numbers]".

Conclusion: as long as you are not doing anything really stupid it seems that there isn't any important
performance reasons to split the bulk queries into smaller batches.

For inserts the conclusion is similar. A lot of memory is used if you go to the millions of items range,
but otherwise it seems it doesn't matter if you do many smaller batches versus one larger batch.

- Anssi

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alessandro Gagliardi 2012-03-01 19:28:32 Re: efficient data reduction (and deduping)
Previous Message Kevin Grittner 2012-03-01 18:51:42 Re: efficient data reduction (and deduping)