Re: Vacuum not cleaning up rows.

From: "S(dot) Bob" <sbob(at)quadratum-braccas(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Vacuum not cleaning up rows.
Date: 2019-06-20 18:25:48
Message-ID: 053c95f2-1425-23ac-0308-ca22887db156@quadratum-braccas.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I took a look at the oldest query, xact & backend times in
pg_stat_activity, but did not see any old dates, as of now (2019-06-20
13:54:12.041426-04) nothing is even an hour old as a start value:

/=# select state, backend_start, xact_start, query_start from
pg_stat_activity where datname = 'problem_database';
 state  |         backend_start         | xact_start          
|          query_start
--------+-------------------------------+-------------------------------+-------------------------------
 idle   | 2019-06-20 13:28:02.342849-04 |                              
| 2019-06-20 13:41:30.561416-04
 idle   | 2019-06-20 13:40:33.578012-04 |                              
| 2019-06-20 13:40:33.861842-04
 idle   | 2019-06-20 13:33:06.638612-04 |                              
| 2019-06-20 13:41:30.56762-04
 idle   | 2019-06-20 13:38:06.549275-04 |                              
| 2019-06-20 13:41:30.59876-04
 idle   | 2019-06-20 13:28:39.431864-04 |                              
| 2019-06-20 13:41:30.462939-04
 idle   | 2019-06-20 13:38:02.583636-04 |                              
| 2019-06-20 13:41:30.078732-04
 active | 2019-06-20 13:39:09.761125-04 | 2019-06-20 13:39:10.058196-04
| 2019-06-20 13:39:10.058196-04
 idle   | 2019-06-20 13:41:23.021731-04 |                              
| 2019-06-20 13:41:23.289443-04
 idle   | 2019-06-20 13:28:10.023462-04 |                              
| 2019-06-20 13:41:30.563567-04
 active | 2019-06-20 13:23:21.697794-04 | 2019-06-20 13:23:44.26898-04 
| 2019-06-20 13:23:44.26898-04
 active | 2019-06-20 13:41:01.806997-04 | 2019-06-20 13:41:02.203739-04
| 2019-06-20 13:41:02.203739-04
 active | 2019-06-20 13:35:38.363437-04 | 2019-06-20 13:41:30.642168-04
| 2019-06-20 13:41:30.642168-04
(12 rows)/

If I run a VACUUM VERBOSE it still claims it cannot remove 1,805,636 row
versions:

/=# vacuum verbose problem_table;
INFO:  vacuuming "problem_table"
INFO:  index "problem_table_pk" now contains 1792227 row versions in
17224 pages
DETAIL:  0 index row versions were removed.
3 index pages have been deleted, 0 are currently reusable.
CPU 0.03s/0.02u sec elapsed 0.05 sec.
INFO:  index "problem_table_1_idx" now contains 1792227 row versions in
17254 pages
DETAIL:  0 index row versions were removed.
4 index pages have been deleted, 0 are currently reusable.
CPU 0.02s/0.02u sec elapsed 0.04 sec.
INFO:  index "problem_table_2_idx" now contains 1792228 row versions in
4943 pages
DETAIL:  0 index row versions were removed.
1 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index "problem_table_3_idx" now contains 1792228 row versions in
9164 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.02 sec.
INFO:  "problem_table": found 0 removable, 1805636 nonremovable row
versions in 34770 out of 34770 pages
DETAIL:  1803634 dead row versions cannot be removed yet.
There were 697 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.17s/0.26u sec elapsed 0.43 sec.
VACUUM

/

I also checked the current tx id and pulled a list of (a)
pg_stat_activity rows including backend_xid and backend_xmin and (b) a
select from the table in question including the oldest xmin values:
/
=#  select txid_current();
 txid_current
--------------
    141325566
(1 row)

=# select state, backend_start, xact_start, query_start, backend_xid,
backend_xmin from pg_stat_activity where datname = 'problem_database';
 state  |         backend_start         | xact_start          
|          query_start          | backend_xid | backend_xmin
--------+-------------------------------+-------------------------------+-------------------------------+-------------+--------------
 idle   | 2019-06-20 13:28:02.342849-04 |                              
| 2019-06-20 13:41:44.71051-04  |             |
 idle   | 2019-06-20 13:40:33.578012-04 |                              
| 2019-06-20 13:40:33.861842-04 |             |
 idle   | 2019-06-20 13:33:06.638612-04 |                              
| 2019-06-20 13:41:44.701796-04 |             |
 idle   | 2019-06-20 13:38:06.549275-04 |                              
| 2019-06-20 13:41:44.652429-04 |             |
 idle   | 2019-06-20 13:28:39.431864-04 |                              
| 2019-06-20 13:41:44.234558-04 |             |
 idle   | 2019-06-20 13:38:02.583636-04 |                              
| 2019-06-20 13:41:42.000154-04 |             |
 idle   | 2019-06-20 13:41:23.021731-04 |                              
| 2019-06-20 13:41:23.289443-04 |             |
 idle   | 2019-06-20 13:28:10.023462-04 |                              
| 2019-06-20 13:41:41.832205-04 |             |
 active | 2019-06-20 13:23:21.697794-04 | 2019-06-20 13:23:44.26898-04 
| 2019-06-20 13:23:44.26898-04 |             |    141309498
 active | 2019-06-20 13:41:01.806997-04 | 2019-06-20 13:41:02.203739-04
| 2019-06-20 13:41:02.203739-04 |             |    141324650
 active | 2019-06-20 13:35:38.363437-04 | 2019-06-20 13:41:44.908629-04
| 2019-06-20 13:41:44.908629-04 |             |    141325259
(11 rows)

t=# select xmin::text, xmax, primary_id from problem_table order by 1;
   xmin    |   xmax    |              primary_id
-----------+-----------+--------------------------------------
 141306153 |         0 | 7aae6212-854b-428c-9a87-d07a178387dc
 141306169 |         0 | 8b9abcd7-a683-4ae3-af1b-e51fa373b836
 141306174 |         0 | f63943c0-3119-4b0b-96e8-d19d04fe48dc
 141306204 |         0 | 95e81769-5ad4-4285-b9dc-a7e2360f1fa2
 141306205 |         0 | 0c6fb4c1-4f17-490c-9e7e-ec7f90edd094
 141306224 | 141306224 | 05c833b6-efc9-4faf-8b3a-760b03d43abc
 141306245 |         0 | 9b0959de-2b3b-474a-ab72-4664f7c1d850
 141306254 |         0 | 0cedd9ef-de2a-4d44-bd28-50587a16174a
 141306264 |         0 | 70145477-26bc-4e2b-b51e-5533bcce0658
 141306265 |         0 | 1b312489-96df-4b32-9d4e-4f872dca1f86

/

I also ran a select * from pg_prepared_xacts which returned no rows

Thanks in advance for any additional help / direction...

On 6/19/19 5:50 PM, Tom Lane wrote:
> "S. Bob" <sbob(at)quadratum-braccas(dot)com> writes:
>> we have a table that consistently has 2x the number of dead rows than
>> live rows.
>> I've done some digging and it looks like somehow the application is
>> holding shared locks that force vacuum to not cleanup rows.
> This is not a locking issue; if it were, vacuum would be unable to
> scan the table at all.
>
> What it is is a snapshot issue: that is, some transaction has a snapshot
> that's old enough to allow it to see those dead rows if it were to look.
> So vacuum can't remove the rows for fear of causing transactional
> consistency failures.
>
> What you want to do is look into pg_stat_activity for transactions with
> very old start times, and then cancel them or otherwise force them to
> terminate. Also, if there are no obvious candidates there, you might
> be having a problem with prepared transactions --- look into
> pg_prepared_xacts to see if there are any old ones of those.
>
> regards, tom lane

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Keith 2019-06-20 18:29:03 Re: Vacuum not cleaning up rows.
Previous Message Achilleas Mantzios 2019-06-20 10:40:08 Re: zero data loss recovery is possbile with pgbackrest tool?