Re: BUG #18966: Invalid SQL queries hang indefinitely until server restart

From: Jérémy Spriet <jeremy(dot)spriet(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18966: Invalid SQL queries hang indefinitely until server restart
Date: 2025-06-25 14:23:01
Message-ID: CA+JbTT+aS+gM0dicANUfDWwBTka6FXQzyT5VSMmAmj+2wBsrDw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> What exactly is "invalid" about the troublesome queries? What does
> pg_stat_activity show them as doing? When the problem is occurring,
> can you run EXPLAIN (*not* EXPLAIN ANALYZE) on them, or does that
> hang too?
Yes, they hang when I run EXPLAIN on them, too.

Here is a 'select * from pg_stat_activity' in CSV format when the problem
occurred:
datid,datname,pid,leader_pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start,xact_start,query_start,state_change,wait_event_type,wait_event,state,backend_xid,backend_xmin,query_id,query,backend_type
16384,mydb,61,,10,role_superadmin,pg_cron scheduler,,,,2025-06-25
08:15:40.365377+00,,,,Extension,Extension,,,,,,pg_cron launcher
,,60,,,,,,,,2025-06-25
08:15:40.365266+00,,,,Activity,AutoVacuumMain,,,,,,autovacuum launcher
,,63,,10,role_superadmin,,,,,2025-06-25
08:15:40.366251+00,,,,Activity,LogicalLauncherMain,,,,,,logical replication
launcher
,,62,,10,role_superadmin,TimescaleDB Background Worker
Launcher,,,,2025-06-25
08:15:40.365899+00,,,,Extension,Extension,,,,,,TimescaleDB Background
Worker Launcher
16384,mydb,69,,10,role_superadmin,,127.0.0.1,,49760,2025-06-25
08:15:40.518269+00,,2025-06-25 08:15:40.539926+00,2025-06-25
08:15:40.539959+00,Client,ClientRead,idle,,,-8817007929064609608,SELECT
pg_advisory_unlock($1),client backend
16384,mydb,168,,16386,role_admin,,,,-1,2025-06-25
08:37:46.868834+00,2025-06-25 08:44:11.558469+00,2025-06-25
08:44:11.558469+00,2025-06-25
08:44:11.558472+00,,,active,,,3749380189022910195,ROLLBACK,client backend
5,postgres,78,,16385,role_monitoring,,127.0.0.1,,49766,2025-06-25
08:15:42.344303+00,,2025-06-25 13:08:42.355812+00,2025-06-25
13:08:42.355985+00,Client,ClientRead,idle,,,4452344162736673220,SELECT *
FROM pg_stat_database_conflicts;,client backend
5,postgres,79,,16385,role_monitoring,,127.0.0.1,,49770,2025-06-25
08:15:42.345051+00,,2025-06-25 13:08:42.34076+00,2025-06-25
13:08:42.347132+00,Client,ClientRead,idle,,,8133081727313365291,"SELECT
pg_database.datname
,pg_database_size(pg_database.datname)
FROM pg_database;",client backend
16384,mydb,207,,16386,role_admin,,,,-1,2025-06-25
08:55:00.037971+00,2025-06-25 08:55:15.378432+00,2025-06-25
08:55:15.378432+00,2025-06-25
08:55:15.378433+00,,,active,,,2064869707185898531,COMMIT,client backend
16384,mydb,719,,16386,role_admin,,,,-1,2025-06-25
11:17:57.910472+00,2025-06-25 11:17:57.932789+00,2025-06-25
11:17:57.94765+00,2025-06-25
11:17:57.947651+00,,,active,,323299,1708853975061214229,SELECT
pg_cancel_backend(0);,client backend
16384,mydb,320,,16386,role_admin,,,,-1,2025-06-25
09:44:53.568006+00,2025-06-25 09:44:53.597946+00,2025-06-25
09:44:53.939852+00,2025-06-25 09:44:53.939856+00,,,active,,323299,,"WITH
infos_base AS (
SELECT DISTINCT aca, region, departement, libelle_nature
FROM contexte.comptage
WHERE uai = 'S105806'
LIMIT 1
),
derniere_annee AS (
SELECT MAX(annee_sc) AS annee
FROM contexte.comptage
WHERE uai = 'S105806'
)

-- uai_base
SELECT 'uai_base' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.effectif) AS nb_peoples
FROM contexte.comptage e
JOIN derniere_annee da ON e.annee_sc = da.annee
WHERE e.uai = 'S105806'
AND e.libelle_nature = (SELECT libelle_nature FROM infos_base LIMIT 1)
GROUP BY e.uai, e.type_etab, e.libelle_nature, e.annee_sc

UNION

-- uai_comp
SELECT 'uai_comp' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.effectif) AS nb_peoples
FROM contexte.comptage e
JOIN dernier",client backend
16384,mydb,321,,16386,role_admin,,,,-1,2025-06-25
09:45:01.856679+00,2025-06-25 09:47:47.979227+00,2025-06-25
09:47:48.046105+00,2025-06-25 09:47:48.046109+00,,,active,,323299,,"SELECT
'uai_base' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.nb_peoples) AS nb_peoples
FROM contexte.comptage e
JOIN derniere_annee da ON e.annee_sc = da.annee
WHERE e.uai = 'S105806'
AND e.libelle_nature = (SELECT libelle_nature FROM infos_base LIMIT 1)
GROUP BY e.uai, e.type_etab, e.libelle_nature, e.annee_sc
LIMIT 1001",client backend
16384,mydb,781,,16386,role_admin,,,,-1,2025-06-25
11:46:45.062895+00,2025-06-25 11:46:45.082754+00,2025-06-25
11:46:45.096873+00,2025-06-25
11:46:45.096874+00,,,active,,,6650430479887907887,BEGIN,client backend
16384,mydb,789,,16386,role_admin,,,,-1,2025-06-25
11:50:10.987069+00,2025-06-25 11:50:15.903804+00,2025-06-25
11:50:15.903804+00,2025-06-25
11:50:15.903807+00,,,active,,,2064869707185898531,COMMIT,client backend
16384,mydb,914,,10,role_superadmin,psql,,,-1,2025-06-25
12:29:28.685433+00,2025-06-25 12:31:49.087451+00,2025-06-25
12:31:49.087451+00,2025-06-25 12:31:49.087454+00,,,active,,323329,,select *
from bad_table;,client backend
,,53,,,,,,,,2025-06-25
08:15:40.338819+00,,,,Activity,BgWriterHibernate,,,,,,background writer
,,52,,,,,,,,2025-06-25 08:15:40.337776+00,,,,,,,,,,,checkpointer
,,59,,,,,,,,2025-06-25
08:15:40.364229+00,,,,Activity,WalWriterMain,,,,,,walwriter

Also, when I checked the lock queries, I saw this:
SELECT pid, locktype, mode, relation::regclass AS table,
virtualtransaction, granted FROM pg_locks WHERE relation IS NOT NULL ORDER
BY relation, mode;
pid | locktype | mode | table |
virtualtransaction | granted
-----+----------+-----------------+--------------------------+--------------------+---------
914 | relation | AccessShareLock | pg_locks | 18/17
| t
321 | relation | AccessShareLock | contexte.comptage | 12/43
| t
320 | relation | AccessShareLock | contexte.comptage | 11/157
| t
(3 rows)

> I don't suppose you can take timescaledb out of the mix to see
> if it's contributing to this.
I will check, but it is not possible to do so easily.

> Yes. One speculation that occurs to me is that the control group settings
> of your container are such that a process that exceeds the memory quota is
> suspended until enough memory becomes free.
The RAM on my Docker was 1GB/4GB when the problem occurred. However, after
I continued to run queries, the memory usage increased to 4GB.

Jérémy

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Karl O. Pinc 2025-06-25 14:42:06 Re: PGXS does not properly uninstall documentation
Previous Message Daniel Gustafsson 2025-06-25 12:31:59 Re: BUG #18969: security tests failures