Re: Deduplication and transaction isolation level

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Steven Schlansker <steven(at)likeness(dot)com>
Cc: François Beausoleil <francois(at)teksol(dot)info>, Forums postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deduplication and transaction isolation level
Date: 2013-09-25 17:52:48
Message-ID: CAHyXU0yeiSxs2bGGjm9fo+CRw0AyYYBKwMc+KhGnpjr=KzJqkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Sep 25, 2013 at 12:50 PM, Steven Schlansker <steven(at)likeness(dot)com> wrote:
>
> On Sep 25, 2013, at 6:04 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
>> On Tue, Sep 24, 2013 at 10:19 PM, François Beausoleil
>> <francois(at)teksol(dot)info> wrote:
>>> Hi all!
>>>
>>> I import many, many rows of data into a table, from three or more computers, 4 times per hour. I have a primary key, and the query I use to import the data is supposed to dedup before inserting, but I still get primary key violations.
>>>
>>> The import process is:
>>>
>>> * Load CSV data into temp table
>>> * INSERT INTO dest SELECT DISTINCT (pkey) FROM temp WHERE NOT EXISTS(temp.pkey = dest.pkey)
>>>
>>> I assumed (erroneously) that this would guarantee no duplicate data could make it into the database. The primary key violations are proving me wrong.
>>
>> Right. Transaction A and B are interleaved: they both run the same
>> check against the same id at the same time. Both checks pass because
>> neither transaction is committed. This problem is not solvable by
>> adjusting the isolation level.
>
> Are you sure that this is the case? It is my understanding that since 9.1 with SSI (https://wiki.postgresql.org/wiki/SSI) if you set the transaction isolation level to SERIALIZABLE, this problem is solved, as the insert will take a "predicate lock" and the other insert cannot succeed.
>
> We use this to detect / resolve concurrent inserts that violate primary keys and it works great.
>
> However in this case it probably doesn't help the OP because the cost of restarting the entire import is likely too high.

ah, you're right!

merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2013-09-25 21:47:27 Re: postgres FDW doesn't support sequences?
Previous Message Steven Schlansker 2013-09-25 17:50:35 Re: Deduplication and transaction isolation level