From: | Raj <raj(at)gusw(dot)net> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED |
Date: | 2020-09-07 18:04:58 |
Message-ID: | eec4db8f-336a-8bb1-fc82-086abadec9bd@gusw.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi all, especially Jim Jarvie, I saw your email to me only now on my
related issue. My issue remains this one:
> Well this got interesting - the already moved error showed up:
and I have already gone through all those index pruning and all that
good stuff.
I remain with my original question from 30th of June, to me it feels
like a bug of some sort:
> "tuple to be locked was already moved to another partition due to
> concurrent update"
>
> This would not exactly look like a bug, because the message says "to
> be locked", so at least it's not allowing two workers to lock the same
> tuple. But it seems that the skip-locked mode should not make an error
> out of this, but treat it as the tuple was already locked. Why would
> it want to lock the tuple (representing the job) if another worker has
> already finished his UPDATE of the job to mark it as "done" (which is
> what makes the tuple move to the "completed" partition.)
>
> Either the SELECT for jobs to do returned a wrong tuple, which was
> already updated, or there is some lapse in the locking.
>
> Either way it would seem to be a waste of time throwing all these
> errors when the tuple should not even have been selected for update
> and locking.
>
> I wonder if anybody knows anything about that issue? Of course you'll
> want to see the DDL and SQL queries, etc. but you can't really try it
> out unless you do some massively parallel magic.
I still think that it should simply not happen. Don't waste time on old
tuples trying to fetch and lock something that's no longer there. It's a
waste of resources.
regards,
-Gunther
On 8/20/2020 6:39 PM, Jim Jarvie wrote:
>
>
> On 20-Aug.-2020 17:42, Michael Lewis wrote:
>> Can you share an explain analyze for the query that does the select for
>> update? I wouldn't assume that partition pruning is possible at all with
>> hash, and it would be interesting to see how it is finding those rows.
>
> Well this got interesting - the already moved error showed up: Note,
> the actual process partitions are regular table partitions, these are
> not hashed. Only the incoming and completed are hashed due to row
> counts at either end of the processing; in flight (where the issue
> shows up) is quite small:
>
> [queuedb] # explain analyze select queueid,txobject,objectid,state
> from mq.queue where (state = 'tx_active' or state='tx_fail_retryable')
> and txobject = 'ticket' limit 250 for update skip locked;
> ERROR: 40001: tuple to be locked was already moved to another
> partition due to concurrent update
> LOCATION: heapam_tuple_lock, heapam_handler.c:405
> Time: 579.131 ms
> [queuedb] # explain analyze select queueid,txobject,objectid,state
> from mq.queue where (state = 'tx_active' or state='tx_fail_retryable')
> and txobject = 'ticket' limit 250 for update skip locked;
> ERROR: 40001: tuple to be locked was already moved to another
> partition due to concurrent update
> LOCATION: heapam_tuple_lock, heapam_handler.c:405
> Time: 568.008 ms
> [queuedb] # explain analyze select queueid,txobject,objectid,state
> from mq.queue where (state = 'tx_active' or state='tx_fail_retryable')
> and txobject = 'ticket' limit 250 for update skip locked;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..25.71 rows=250 width=34) (actual
> time=1306.041..1306.338 rows=250 loops=1)
> -> LockRows (cost=0.00..7934.38 rows=77150 width=34) (actual
> time=1306.040..1306.315 rows=250 loops=1)
> -> Append (cost=0.00..7162.88 rows=77150 width=34) (actual
> time=520.685..1148.347 rows=31500 loops=1)
> -> Seq Scan on queue_tx_active (cost=0.00..6764.50
> rows=77000 width=34) (actual time=520.683..1145.258 rows=31500 loops=1)
> Filter: ((txobject = 'ticket'::mq.queue_object)
> AND ((state = 'tx_active'::mq.tx_state) OR (state =
> 'tx_fail_retryable'::mq.tx_state)))
> -> Seq Scan on queue_tx_fail_retryable
> (cost=0.00..12.62 rows=150 width=34) (never executed)
> Filter: ((txobject = 'ticket'::mq.queue_object)
> AND ((state = 'tx_active'::mq.tx_state) OR (state =
> 'tx_fail_retryable'::mq.tx_state)))
> Planning Time: 0.274 ms
> Execution Time: 1306.380 ms
> (9 rows)
>
> Time: 1317.150 ms (00:01.317)
> [queuedb] #
>
From | Date | Subject | |
---|---|---|---|
Next Message | Dirk Krautschick | 2020-09-08 10:29:18 | AW: Query performance issue |
Previous Message | Nagaraj Raj | 2020-09-05 21:49:48 | Re: Query performance issue |