Re: Large insert and delete batches

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Kääriäinen Anssi <anssi(dot)kaariainen(at)thl(dot)fi>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Large insert and delete batches
Date: 2012-03-01 20:51:52
Message-ID: CABRT9RAiNkwgbOU2hufF4megQWaZKXEqXmAcyi-+bWK=0y+NTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Mar 1, 2012 at 21:06, Kääriäinen Anssi <anssi(dot)kaariainen(at)thl(dot)fi> wrote:
> The queries are "select * from the_table where id = ANY(ARRAY[list_of_numbers])"
> and the similar delete, too.

> [...] However, once you go into
> millions of items in the list, the query will OOM my Postgres server.

The problem with IN() and ARRAY[] is that the whole list of numbers
has to be parsed by the SQL syntax parser, which has significant
memory and CPU overhead (it has to accept arbitrary expressions in the
list). But there's a shortcut around the parser: you can pass in the
list as an array literal string, e.g:
select * from the_table where id = ANY('{1,2,3,4,5}')

The SQL parser considers the value one long string and passes it to
the array input function, which is a much simpler routine. This should
scale up much better.

Even better if you could pass in the array as a query parameter, so
the SQL parser doesn't even see the long string -- but I think you
have to jump through some hoops to do that in psycopg2.

Regards,
Marti

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2012-03-01 21:18:00 Re: Bad estimation for "where field not in"
Previous Message Kevin Grittner 2012-03-01 20:12:02 Re: efficient data reduction (and deduping)