Re: INSERTing lots of data

From: Martin Gainty <mgainty(at)hotmail(dot)com>
To: <mabewlun(at)gmail(dot)com>, <joachim(dot)worringen(at)iathh(dot)de>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: INSERTing lots of data
Date: 2010-05-28 10:14:10
Message-ID: BLU142-W33866284954F41889B4C79AEEB0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Good Afternoon Szymon!

Could you explain what a Python GIL is? and if there is any workaround to Python GIL we can implement to achieve better performance..possibly at the database level?

Mit freundlichen Grüßen/Les plus sincères amitiés
Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.

Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.

Date: Fri, 28 May 2010 11:48:16 +0200
Subject: Re: [GENERAL] INSERTing lots of data
From: mabewlun(at)gmail(dot)com
To: joachim(dot)worringen(at)iathh(dot)de
CC: pgsql-general(at)postgresql(dot)org

2010/5/28 Joachim Worringen <joachim(dot)worringen(at)iathh(dot)de>

Greetings,

my Python application (http://perfbase.tigris.org) repeatedly needs to insert lots of data into an exsting, non-empty, potentially large table. Currently, the bottleneck is with the Python application, so I intend to multi-thread it. Each thread should work on a part of the input file.

I already multi-threaded the query part of the application, which requires to use one connection per thread - cursors a serialized via a single connection.

Provided that
- the threads use their own connection
- the threads perform all INSERTs within a single transaction
- the machine has enough resources

will I get a speedup? Or will table-locking serialize things on the server side?

Suggestions for alternatives are welcome, but the data must go through the Python application via INSERTs (no bulk insert, COPY etc. possible)

Remember about Python's GIL in some Python implementations so those threads could be serialized at the Python level.

This is possible that those inserts will be faster. The speed depends on the table structure, some constraints and triggers and even database configuration. The best answer is: just check it on some test code, make a simple multithreaded aplication and try to do the inserts and check that out.

regards
Szymon Guz


_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2010-05-28 10:22:15 Re: INSERTing lots of data
Previous Message Joachim Worringen 2010-05-28 10:00:58 Re: INSERTing lots of data