Re: Can Postgres Not Do This Safely ?!?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andy Colson <andy(at)squeakycode(dot)net>, Karl Pickett <karl(dot)pickett(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Can Postgres Not Do This Safely ?!?
Date: 2010-10-31 03:26:55
Message-ID: AANLkTimJAQNgyBHgv8j52u5D9yf1pUNG2zDPPvdZSgby@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Oct 29, 2010 at 7:59 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Fri, 2010-10-29 at 16:57 -0500, Andy Colson wrote:
>> begin
>> insert into logged select * from events where processed = false;
>> update events set processed = true where processed = false;
>> commit;
>
> There's a race condition there. The SELECT in the INSERT statement may
> read 5 tuples, then a concurrent transaction inserts a 6th tuple, then
> you do an update on all 6 tuples.
>
>> begin
>> select * from events where processed = false;
>> ... do you processing on each, which would include inserting it...
>> update events set processed = true where processed = false;
>> commit;
>
> Same problem here.
>
>> Just make sure you do it all in the same transaction, so the update sees
>> the exact same set as the select.
>
> You need to use SERIALIZABLE isolation level for this to work. The
> default is READ COMMITTED.
>
> Or better yet, use Merlin's suggestion of PgQ. They've already worked
> this out in a safe, efficient way. It's the basis for Londiste, a
> replication system.

With multiple writers, single reader, it's usually ok to go the ad hoc
route. Just read a bunch of records, do something with them, and
delete/move them. This will become even easier when wCTE becomes
available, and you can do something like: with foo as (delete from
stuff_to_do returning * limit something), bar as (insert into stuff
done select * form foo) select do_stuff(...) from foo;

Heavy artillery like PGQ becomes necessary IMO when you have multiple
readers pulling things off the queue and doing things. Doing this in
ad hoc fashion would require separating the 'getting stuff from the
queue' and the 'doing stuff'. This is harder than it looks, and the
skytools people have apparently got it completely worked out (adding a
3rd party dependency is never something to be taken lightly though).
I'm a little old school in the way I do things -- I try to work the
problem down to where the simpler solutions work. I do a *lot* of
batch processing though, and maybe I should put learning PGQ on my
radar.

merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Guillaume Lelarge 2010-10-31 06:22:37 Re: Unhandled exception in PGAdmin when opening 16-million-record table
Previous Message bricklen 2010-10-31 00:25:53 Re: Can Postgres Not Do This Safely ?!?