Re: Postgres for a "data warehouse", 5-10 TB

From: Andy Colson <andy(at)squeakycode(dot)net>
To: Robert Klemme <shortcutter(at)googlemail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres for a "data warehouse", 5-10 TB
Date: 2011-09-12 17:22:48
Message-ID: 4E6E3FE8.5070609@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 9/12/2011 12:15 PM, Robert Klemme wrote:
> On 11.09.2011 19:02, Marti Raudsepp wrote:
>> On Sun, Sep 11, 2011 at 17:23, Andy Colson<andy(at)squeakycode(dot)net> wrote:
>>> On 09/11/2011 08:59 AM, Igor Chudov wrote:
>>>> By the way, does that INSERT UPDATE functionality or something like
>>>> this exist in Postgres?
>>> You have two options:
>>> 1) write a function like:
>>> create function doinsert(_id integer, _value text) returns void as
>>> 2) use two sql statements:
>>
>> Unfortunately both of these options have caveats. Depending on your
>> I/O speed, you might need to use multiple loader threads to saturate
>> the write bandwidth.
>>
>> However, neither option is safe from race conditions. If you need to
>> load data from multiple threads at the same time, they won't see each
>> other's inserts (until commit) and thus cause unique violations. If
>> you could somehow partition their operation by some key, so threads
>> are guaranteed not to conflict each other, then that would be perfect.
>> The 2nd option given by Andy is probably faster.
>>
>> You *could* code a race-condition-safe function, but that would be a
>> no-go on a data warehouse, since each call needs a separate
>> subtransaction which involves allocating a transaction ID.
>
> Wouldn't it be sufficient to reverse order for race condition safety?
> Pseudo code:
>
> begin
> insert ...
> catch
> update ...
> if not found error
> end
>
> Speed is another matter though...
>
> Kind regards
>
> robert
>
>
>

No, I dont think so, if you had two loaders, both would start a
transaction, then neither could see what the other was doing. There are
transaction isolation levels, but they are like playing with fire. (in
my opinion).

-Andy

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Schnabel 2011-09-12 17:33:33 Allow sorts to use more available memory
Previous Message Robert Klemme 2011-09-12 17:15:35 Re: Postgres for a "data warehouse", 5-10 TB