Re: SELECT ... FOR UPDATE performance costs? alternatives?

From: "D(dot) Dante Lorenso" <dante(at)lorenso(dot)com>
To: Erik Jones <erik(at)myemma(dot)com>
Cc: btober(at)ct(dot)metrocast(dot)net, pgsql-general(at)postgresql(dot)org
Subject: Re: SELECT ... FOR UPDATE performance costs? alternatives?
Date: 2007-08-16 02:21:44
Message-ID: 46C3B4B8.9040105@lorenso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Erik Jones wrote:
>
> On Aug 15, 2007, at 2:39 PM, btober(at)ct(dot)metrocast(dot)net wrote:
>
>> Erik Jones wrote:
>>> On Aug 15, 2007, at 1:09 PM, D. Dante Lorenso wrote:
>>>
>>>> ...to ensure that only one server is processing the queue item, so
>>>> inside PGSQL, use SELECT ... FOR UPDATE to lock the row...
>>>> When my server is under severe load, however, this function begins
>>>> to take a long time to execute and I begin to suspect that the FOR
>>>> UPDATE lock might be locking the whole table and not just the row.
>>>> How do I know if this is the case, how can I optimize this
>>>> procedure, and how should I be doing this differently? ...
>>>>
>>>> Thoughts?
>>>
>>> SELECT ... FOR UPDATE should only be locking the rows returned by
>>> your the select statement, in this case the one row. You can check
>>> what locks exist on a table (and their type) with the pg_locks system
>>> view.
>>>
>>
>> Is that correct? Documentation section 12.3.1. Table-Level Locks
>> states 'The list below shows the available lock modes ...Remember that
>> all of these lock modes are table-level locks, even if the name
>> contains the word "row"'.
>
> You will notice that SELECT ... FOR UPDATE is not in that list. It's
> covered in the next section on row level locks.
>>
>> I wonder why bother with the SELECT statement at all. Why not just go
>> straight to the UPDATE statement with something like
>>
>> UPDATE queue SET
>> status = in_new_status,
>> ts_start = NOW(),
>> ts_end = NULL,
>> hostname = COALESCE(in_hostname, hostname)
>> WHERE tcq_id = (SELECT tcq_id FROM queue q WHERE q.status =
>> in_status ORDER BY tcq_id ASC LIMIT 1);
>>
>> He may need to trap an exception for the "not found" case, but what's
>> the big deal with that?
>>
>> UPDATE statements acquire a ROW EXCLUSIVE on the table, which
>> conflicts, among other things, with ROW EXCLUSIVE, so it will block
>> other UPDATE statements initiated by other transactions.
>
> That won't work because the update won't lock the row until the select
> returns. So, if two process execute that at the same time they will
> both execute the subquery and return the same result, the first will
> update it and the second will then (redundantly) update it.

It also won't work because I need to change AND read the row. If I only
do the update, I don't know what was updated. I still need to return
the tcq_id to my application.

Maybe the update could look like this:

UPDATE queue SET
status = in_new_status,
ts_start = NOW(),
ts_end = NULL,
hostname = COALESCE(in_hostname, hostname),
WHERE status = in_status;

But there I don't have the LIMIT 1, and I also don't know which rows got
updated. I supposed there might be some magic to find the OID of the
affected rows, but I don't know how what would be done.

I need logic like "atomic test and set" or pop 1 item off the queue
atomically and tell me what that item was.

In my situation, there are a dozen or so machines polling this queue
periodically looking for work to do. As more polling is occurring, the
locks seem to be taking longer so I was worried table-level locks might
be occurring.

-- Dante

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mohd Kamal Bin Mustafa 2007-08-16 02:23:37 Re: Writing most code in Stored Procedures
Previous Message =?EUC-KR?B?vK2x4ryu?= 2007-08-16 01:56:54 please! SPI_finish is strange