Re: bigserial continuity safety

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Pawel Veselov <pawel(dot)veselov(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: bigserial continuity safety
Date: 2015-04-14 03:13:39
Message-ID: CAKFQuwZMzahFacYkOFw_TQtBM4tdy6-TH0KHFn5o=N2165Lm8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Apr 13, 2015 at 7:01 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:

> On 4/13/15 7:45 PM, David G. Johnston wrote:
>
>> On Mon, Apr 13, 2015 at 3:05 PM, Pawel Veselov <pawel(dot)veselov(at)gmail(dot)com
>> <mailto:pawel(dot)veselov(at)gmail(dot)com>>wrote:
>>
>>
>> Hi.
>>
>> If I have a table created as:
>>
>> CREATE TABLE xq_agr (
>> id BIGSERIAL PRIMARY KEY,
>> node text not null
>> );
>>
>> and that multiple applications insert into. The applications never
>> explicitly specify the value for 'id'.
>> Is it safe to, on a single connection, do:
>>
>> - open transaction (default transaction isolation)
>> - Open cursor for select * from xq_agr order by id asc
>> - do something with current record
>> - advance the cursor (and repeat something), but stop at some point
>> (id = LAST_ID), and
>> - delete from xq_agr where id <= LAST_ID;
>> - commit
>>
>> "safe to" means - whether the cursor will not miss any records that
>> were deleted at the end.
>>
>> I'm suspecting that depending on the commit order, I may have
>> situations when:
>> - TX1 insert ID 1
>> - TX2 insert ID 2
>> ​​
>> - TX2 commits
>> - TX3 scans 2
>> - TX1 commits
>> - TX3 deletes <= 2
>> - record ID1 is deleted, but never processed.
>>
>>
>> ​Going to ignore the MVC question for the moment and describe a better
>> "state transition mechanism" to consider.
>>
>> pending -> active -> completed
>>
>> If you ensure you never delete (i.e., transition to completed) something
>> that isn't active then you can never delete an item in pending.
>>
>> ​Limit the locking to the state transitions only.
>>
>> The downside is the need to deal with "active" items that have been
>> abandoned by whatever process marked them active.
>>
>
> Another option is DELETE RETURNING. Instead of an initial SELECT to find
> records to work on, you would do DELETE FROM WHERE RETURNING * and deal
> with those records. I don't know if that's safe with a cursor though; I
> believe the DELETE fully materializes before records start coming back. So
> you need to handle all the rows from the SELECT or abort.
>
> Back to your question: you should probably not use "<=" in your where
>> clause. However, in READ COMMITTED TX3 cannot see ID1 since the
>> snapshot it took out was created before TX1 committed. I am not fluent
>>
>
> Actually, that's not necessarily true. It depends when TX3 actually takes
> it's snapshot, which is NOT when it runs BEGIN. I believe there's other
> problems you'd run into as well. Basically, READ COMMITTED does nothing to
> protect you from "phantom reads".
>
>
​I was basing that off of:

​​ - TX2 commits
- TX3 scans 2
- TX1 commits

​The scanning causes the snapshot to be taken and occurs before TX1.​

As for the isolation levels I apparently got the two READ ones reversed in
my head...my bad :(

REPEATABLE READ should protect you from phantom reads, but it won't help
> you if someone changes the data. If you're going to try and go this route,
> SERIALIZABLE is your best bet.

While this is likely true if there is no other use of SERIALIZABLE in the
existing codebase then doing so requires learning/adding transaction retry
to the necessary skills and tools. Its worth considering other approaches
to avoid the cognitive overhead of serializable.​

> enough to work through the entire scenario in my head. I'd suggest you
>> actually open up 3 psql sessions and play with them to see how things
>> really behave.
>>
>
> That's really not safe enough. There's just too many different race
> conditions you can encounter, and I'd bet that you couldn't even reproduce
> some of them from a client.

​I guess there is a difference between knowing something is obviously wrong
because this simple testing failed and not realizing that you still have a
problem because there was no way to reasonably test the condition you are
hitting.

The question becomes whether you rephrase the solution to make it simpler
and thus not as exposed to race conditions and the like (or fails
gracefully if it is - no deadlocks and hopefully minimal waiting) or
whether to simply prevent them outright (and quickly?) by failing with a
serialization exception.

For me, a simply "SELECT FOR UPDATE / UPDATE WHERE" command in a
>> function solves the problem as small scale with minimal performance
>> degradation. The transition from "pending" to "active" is effectively
>> serialized and the transition from "active" to "completed" only occurs
>> when the process has been performed and it is not possible to have two
>> client simultaneously processing the same work.
>>
>
> Note that that isn't safe from repeatable reads. What you're describing is
> only safe if the WHERE clause on the update is guaranteed to always find
> only one row (ie, in this example, by using xq_agr.id = something).
> Anything other than that is asking for trouble.
>

​Yes, the result of the select returns an ID of (one or more) "pending"
jobs which has a LIMIT 1 applied to it and the first record is immediately
updated to reflect its "active" status.​ At the scale I operate at this
has proven acceptable. There are a few nuances - workers are not
homogeneous and can only work on a subset of all available tasks based upon
meta-data associated with each task.

David J.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ilya Ashchepkov 2015-04-14 03:58:01 Re: Hot standby problems: consistent state not reached, no connection to master server.
Previous Message Jim Nasby 2015-04-14 02:38:41 Re: With Update From ... vs. Update ... From (With)