CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

From: Jim Jarvie <jim(at)talentstack(dot)to>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
Date: 2020-08-18 23:52:56
Message-ID: c192f8bf-a747-6ad9-c54d-1bd6febafc4f@talentstack.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Using V12, Linux [Ubuntu 16.04LTS]

I have a system which implements a message queue with the basic pattern
that a process selects a group of, for example 250, rows for processing
via SELECT .. LIMIT 250 FOR UPDATE SKIP LOCKED.

When there are a small number of concurrent connections to process the
queue, this seems to work as expected and connections quickly obtain a
unique block of 250 rows for processing.

However, as I scale up the number of concurrent connections, I see a
spike in CPU (to 100% across 80 cores) when the SELECT FOR UPDATE SKIP
LOCKED executes and the select processes wait for multiple minutes
(10-20 minutes) before completing.  My use case requires around 256
concurrent processors for the queue but I've been unable to scale beyond
128 without everything grinding to a halt.

The queue table itself fits in RAM (with 2M hugepages) and during the
wait, all the performance counters drop to almost 0 - no disk read or
write (semi-expected due to the table fitting in memory) with 100%
buffer hit rate in pg_top and row read around 100/s which is much
smaller than expected.

After processes complete the select and the number of waiting selects
starts to fall, CPU load falls and then suddenly the remaining processes
all complete within a few seconds and things perform normally until the
next time there are a group of SELECT FOR UPDATE statements which bunch
together and things then repeat.

I found that performing extremely frequent vacuum analyze (every 30
minutes) helps a small amount but this is not that helpful so problems
are still very apparent.

I've exhausted all the performance tuning and analysis results I can
find that seem even a little bit relevant but cannot get this cracked.

Is anyone on the list able to help with suggestions of what I can do to
track why this CPU hogging happens as this does seem to be the root of
the problem?

Thanks in advance,

Jim

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Lewis 2020-08-19 00:08:56 Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
Previous Message Satyam Shekhar 2020-08-18 16:27:34 Replication lag due to lagging restart_lsn