Re: INSERTing lots of data

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Joachim Worringen <joachim(dot)worringen(at)iathh(dot)de>
Cc: pgsql-general(at)postgresql(dot)org, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Subject: Re: INSERTing lots of data
Date: 2010-06-01 03:45:15
Message-ID: 4C04824B.2080205@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Joachim Worringen wrote:
> my Python application (http://perfbase.tigris.org) repeatedly needs to
> insert lots of data into an exsting, non-empty, potentially large
> table. Currently, the bottleneck is with the Python application, so I
> intend to multi-thread it. Each thread should work on a part of the
> input file.

You are wandering down a path followed by pgloader at one point:
http://pgloader.projects.postgresql.org/#toc6 and one that I fought with
briefly as well. Simple multi-threading can be of minimal help in
scaling up insert performance here, due to the Python issues involved
with the GIL. Maybe we get Dimitri to chime in here, he did more of
this than I did.

Two thoughts. First, build a test performance case assuming it will
fail to scale upwards, looking for problems. If you get lucky, great,
but don't assume this will work--it's proven more difficult than is
obvious in the past for others.

Second, if you do end up being throttled by the GIL, you can probably
build a solution for Python 2.6/3.0 using the multiprocessing module for
your use case: http://docs.python.org/library/multiprocessing.html

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2010-06-01 04:01:02 Re: What Linux edition we should chose?
Previous Message Bruce Momjian 2010-06-01 03:19:56 Re: server-side extension in c++