From: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
---|---|
To: | Jim Jarvie <jim(at)talentstack(dot)to>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED |
Date: | 2020-08-19 06:04:20 |
Message-ID: | d851b0b12c0bca0af27d6918a41d3787f5725272.camel@cybertec.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, 2020-08-18 at 19:52 -0400, Jim Jarvie wrote:
> I have a system which implements a message queue with the basic pattern that a process selects a group of,
> for example 250, rows for processing via SELECT .. LIMIT 250 FOR UPDATE SKIP LOCKED.
>
> When there are a small number of concurrent connections to process the queue, this seems to work as
> expected and connections quickly obtain a unique block of 250 rows for processing.
> However, as I scale up the number of concurrent connections, I see a spike in CPU (to 100% across 80 cores)
> when the SELECT FOR UPDATE SKIP LOCKED executes and the select processes wait for multiple minutes
> (10-20 minutes) before completing. My use case requires around 256 concurrent processors for the queue
> but I've been unable to scale beyond 128 without everything grinding to a halt.
>
> The queue table itself fits in RAM (with 2M hugepages) and during the wait, all the performance counters
> drop to almost 0 - no disk read or write (semi-expected due to the table fitting in memory) with 100%
> buffer hit rate in pg_top and row read around 100/s which is much smaller than expected.
>
> After processes complete the select and the number of waiting selects starts to fall, CPU load falls and
> then suddenly the remaining processes all complete within a few seconds and things perform normally until
> the next time there are a group of SELECT FOR UPDATE statements which bunch together and things then repeat.
>
> I found that performing extremely frequent vacuum analyze (every 30 minutes) helps a small amount but
> this is not that helpful so problems are still very apparent.
>
> I've exhausted all the performance tuning and analysis results I can find that seem even a little bit
> relevant but cannot get this cracked.
>
> Is anyone on the list able to help with suggestions of what I can do to track why this CPU hogging happens
> as this does seem to be the root of the problem?
You should
- check with "pgstattuple" if the table is bloated.
- use "perf" to see where the CPU time is spent.
- look at "pg_stat_activity" for wait events (unlikely if the CPU is busy).
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-08-19 11:43:18 | Re: Replication lag due to lagging restart_lsn |
Previous Message | Michael Lewis | 2020-08-19 00:45:26 | Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED |