| From: | "S(dot) Bob" <sbob(at)quadratum-braccas(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | pgsql-admin <pgsql-admin(at)postgresql(dot)org> |
| Subject: | Re: Vacuum not cleaning up rows. |
| Date: | 2019-06-20 18:25:48 |
| Message-ID: | 053c95f2-1425-23ac-0308-ca22887db156@quadratum-braccas.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-admin |
I took a look at the oldest query, xact & backend times in
pg_stat_activity, but did not see any old dates, as of now (2019-06-20
13:54:12.041426-04) nothing is even an hour old as a start value:
/=# select state, backend_start, xact_start, query_start from
pg_stat_activity where datname = 'problem_database';
state | backend_start | xact_start
| query_start
--------+-------------------------------+-------------------------------+-------------------------------
idle | 2019-06-20 13:28:02.342849-04 |
| 2019-06-20 13:41:30.561416-04
idle | 2019-06-20 13:40:33.578012-04 |
| 2019-06-20 13:40:33.861842-04
idle | 2019-06-20 13:33:06.638612-04 |
| 2019-06-20 13:41:30.56762-04
idle | 2019-06-20 13:38:06.549275-04 |
| 2019-06-20 13:41:30.59876-04
idle | 2019-06-20 13:28:39.431864-04 |
| 2019-06-20 13:41:30.462939-04
idle | 2019-06-20 13:38:02.583636-04 |
| 2019-06-20 13:41:30.078732-04
active | 2019-06-20 13:39:09.761125-04 | 2019-06-20 13:39:10.058196-04
| 2019-06-20 13:39:10.058196-04
idle | 2019-06-20 13:41:23.021731-04 |
| 2019-06-20 13:41:23.289443-04
idle | 2019-06-20 13:28:10.023462-04 |
| 2019-06-20 13:41:30.563567-04
active | 2019-06-20 13:23:21.697794-04 | 2019-06-20 13:23:44.26898-04
| 2019-06-20 13:23:44.26898-04
active | 2019-06-20 13:41:01.806997-04 | 2019-06-20 13:41:02.203739-04
| 2019-06-20 13:41:02.203739-04
active | 2019-06-20 13:35:38.363437-04 | 2019-06-20 13:41:30.642168-04
| 2019-06-20 13:41:30.642168-04
(12 rows)/
If I run a VACUUM VERBOSE it still claims it cannot remove 1,805,636 row
versions:
/=# vacuum verbose problem_table;
INFO: vacuuming "problem_table"
INFO: index "problem_table_pk" now contains 1792227 row versions in
17224 pages
DETAIL: 0 index row versions were removed.
3 index pages have been deleted, 0 are currently reusable.
CPU 0.03s/0.02u sec elapsed 0.05 sec.
INFO: index "problem_table_1_idx" now contains 1792227 row versions in
17254 pages
DETAIL: 0 index row versions were removed.
4 index pages have been deleted, 0 are currently reusable.
CPU 0.02s/0.02u sec elapsed 0.04 sec.
INFO: index "problem_table_2_idx" now contains 1792228 row versions in
4943 pages
DETAIL: 0 index row versions were removed.
1 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: index "problem_table_3_idx" now contains 1792228 row versions in
9164 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.02 sec.
INFO: "problem_table": found 0 removable, 1805636 nonremovable row
versions in 34770 out of 34770 pages
DETAIL: 1803634 dead row versions cannot be removed yet.
There were 697 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.17s/0.26u sec elapsed 0.43 sec.
VACUUM
/
I also checked the current tx id and pulled a list of (a)
pg_stat_activity rows including backend_xid and backend_xmin and (b) a
select from the table in question including the oldest xmin values:
/
=# select txid_current();
txid_current
--------------
141325566
(1 row)
=# select state, backend_start, xact_start, query_start, backend_xid,
backend_xmin from pg_stat_activity where datname = 'problem_database';
state | backend_start | xact_start
| query_start | backend_xid | backend_xmin
--------+-------------------------------+-------------------------------+-------------------------------+-------------+--------------
idle | 2019-06-20 13:28:02.342849-04 |
| 2019-06-20 13:41:44.71051-04 | |
idle | 2019-06-20 13:40:33.578012-04 |
| 2019-06-20 13:40:33.861842-04 | |
idle | 2019-06-20 13:33:06.638612-04 |
| 2019-06-20 13:41:44.701796-04 | |
idle | 2019-06-20 13:38:06.549275-04 |
| 2019-06-20 13:41:44.652429-04 | |
idle | 2019-06-20 13:28:39.431864-04 |
| 2019-06-20 13:41:44.234558-04 | |
idle | 2019-06-20 13:38:02.583636-04 |
| 2019-06-20 13:41:42.000154-04 | |
idle | 2019-06-20 13:41:23.021731-04 |
| 2019-06-20 13:41:23.289443-04 | |
idle | 2019-06-20 13:28:10.023462-04 |
| 2019-06-20 13:41:41.832205-04 | |
active | 2019-06-20 13:23:21.697794-04 | 2019-06-20 13:23:44.26898-04
| 2019-06-20 13:23:44.26898-04 | | 141309498
active | 2019-06-20 13:41:01.806997-04 | 2019-06-20 13:41:02.203739-04
| 2019-06-20 13:41:02.203739-04 | | 141324650
active | 2019-06-20 13:35:38.363437-04 | 2019-06-20 13:41:44.908629-04
| 2019-06-20 13:41:44.908629-04 | | 141325259
(11 rows)
t=# select xmin::text, xmax, primary_id from problem_table order by 1;
xmin | xmax | primary_id
-----------+-----------+--------------------------------------
141306153 | 0 | 7aae6212-854b-428c-9a87-d07a178387dc
141306169 | 0 | 8b9abcd7-a683-4ae3-af1b-e51fa373b836
141306174 | 0 | f63943c0-3119-4b0b-96e8-d19d04fe48dc
141306204 | 0 | 95e81769-5ad4-4285-b9dc-a7e2360f1fa2
141306205 | 0 | 0c6fb4c1-4f17-490c-9e7e-ec7f90edd094
141306224 | 141306224 | 05c833b6-efc9-4faf-8b3a-760b03d43abc
141306245 | 0 | 9b0959de-2b3b-474a-ab72-4664f7c1d850
141306254 | 0 | 0cedd9ef-de2a-4d44-bd28-50587a16174a
141306264 | 0 | 70145477-26bc-4e2b-b51e-5533bcce0658
141306265 | 0 | 1b312489-96df-4b32-9d4e-4f872dca1f86
/
I also ran a select * from pg_prepared_xacts which returned no rows
Thanks in advance for any additional help / direction...
On 6/19/19 5:50 PM, Tom Lane wrote:
> "S. Bob" <sbob(at)quadratum-braccas(dot)com> writes:
>> we have a table that consistently has 2x the number of dead rows than
>> live rows.
>> I've done some digging and it looks like somehow the application is
>> holding shared locks that force vacuum to not cleanup rows.
> This is not a locking issue; if it were, vacuum would be unable to
> scan the table at all.
>
> What it is is a snapshot issue: that is, some transaction has a snapshot
> that's old enough to allow it to see those dead rows if it were to look.
> So vacuum can't remove the rows for fear of causing transactional
> consistency failures.
>
> What you want to do is look into pg_stat_activity for transactions with
> very old start times, and then cancel them or otherwise force them to
> terminate. Also, if there are no obvious candidates there, you might
> be having a problem with prepared transactions --- look into
> pg_prepared_xacts to see if there are any old ones of those.
>
> regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Keith | 2019-06-20 18:29:03 | Re: Vacuum not cleaning up rows. |
| Previous Message | Achilleas Mantzios | 2019-06-20 10:40:08 | Re: zero data loss recovery is possbile with pgbackrest tool? |