From: | Jim Jarvie <jim(at)talentstack(dot)to> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED |
Date: | 2020-08-18 23:52:56 |
Message-ID: | c192f8bf-a747-6ad9-c54d-1bd6febafc4f@talentstack.to |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Using V12, Linux [Ubuntu 16.04LTS]
I have a system which implements a message queue with the basic pattern
that a process selects a group of, for example 250, rows for processing
via SELECT .. LIMIT 250 FOR UPDATE SKIP LOCKED.
When there are a small number of concurrent connections to process the
queue, this seems to work as expected and connections quickly obtain a
unique block of 250 rows for processing.
However, as I scale up the number of concurrent connections, I see a
spike in CPU (to 100% across 80 cores) when the SELECT FOR UPDATE SKIP
LOCKED executes and the select processes wait for multiple minutes
(10-20 minutes) before completing. My use case requires around 256
concurrent processors for the queue but I've been unable to scale beyond
128 without everything grinding to a halt.
The queue table itself fits in RAM (with 2M hugepages) and during the
wait, all the performance counters drop to almost 0 - no disk read or
write (semi-expected due to the table fitting in memory) with 100%
buffer hit rate in pg_top and row read around 100/s which is much
smaller than expected.
After processes complete the select and the number of waiting selects
starts to fall, CPU load falls and then suddenly the remaining processes
all complete within a few seconds and things perform normally until the
next time there are a group of SELECT FOR UPDATE statements which bunch
together and things then repeat.
I found that performing extremely frequent vacuum analyze (every 30
minutes) helps a small amount but this is not that helpful so problems
are still very apparent.
I've exhausted all the performance tuning and analysis results I can
find that seem even a little bit relevant but cannot get this cracked.
Is anyone on the list able to help with suggestions of what I can do to
track why this CPU hogging happens as this does seem to be the root of
the problem?
Thanks in advance,
Jim
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Lewis | 2020-08-19 00:08:56 | Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED |
Previous Message | Satyam Shekhar | 2020-08-18 16:27:34 | Replication lag due to lagging restart_lsn |