Re: BUG #17449: Disk space not released

From: Giorgio Saviane <gsaviane(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17449: Disk space not released
Date: 2022-05-19 20:15:26
Message-ID: CAHs6c0c9j5P+kBTX67KbJ5_e4wtdrAumyRk-k3vOnJRYUmR0YA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> On Sat, Mar 26, 2022 at 7:13 AM PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference: 17449
> > Logged by: Giorgio Saviane
> > Email address: gsaviane(at)gmail(dot)com
> > PostgreSQL version: 11.13
> > Operating system: Linux 5.8
> > Description:
> >
> > Hello, I noticed an uncontrolled disk occupation growth caused by a
> Postgres
> > database in one of my deployments.
> > The actual database size took more than 500Gb (checked with select
> > pg_size_pretty(pg_database_size('dbname')) although tables accounted for
> a
> > total of ~ 50Gb (checked with pg_total_relation_size()).
>
> So, the output of pg_database_size is 500GB and pg_total_relation_size
> for all the tables is 50GB right? pg_database_size includes table
> space size as well, were there any table spaces and files under it?
> BTW, pg_total_relation_size also includes all the files of a relation
> irrespective of whether the vacuum happened or not.
>

The database was created in a single tablespace. Tables did not hold any
other
additional files than those inherently used for storing their data and
indexes.

>
> > Despite any attempt
> > of full vacuum the discrepancy remained the same. I suspect that Postgres
> > started leaking disk space. I could see many 1Gb files with a timestamp
> of
> > two months back in time in the postgres data folder.
> > Restarting the server did not have any effect, so I decided to pg_dump
> the
> > database and pg_restore the backup in a new instance. That worked, the
> new
> > database is now ~ 50 Gb and dropping the old one released that 500Gb of
> disk
> > space.
> > The database was under streaming replication and I noticed the postgres
> log
> > reporting many of these messages
> >
> > requested WAL segment 0000000100000000000000E3 has already been removed
> >
> > Could be that the leak started in correspondence of that error?
> > If so, is there anything we can do to prevent it? I already set
> > wal_keep_segments = 100, but I'm not sure it is enough and how to tune
> it.
> > Is there any possible workaround to release the leaked space without
> going
> > through a backup? (It took two hours)
>
> Which all other directories in the data directory took more space? Is
> it pg_wal or base or some other? Were there any inactive replication
> slots? Was the archiving happening properly?
>

The content of /var/lib/postoresql/11/main/base dir was as following:
$ du -hs
7.6M 1
78M 1194171
7.6M 1236634587
7.6M 13052
7.6M 13053
561G 16385
20M pgsql_tmp

And that 16385 dir was full of 1Gb files whose name had a <number>.<digit>
pattern. I cannot say for sure if there were other subdirectories in it.
I didn't check the replication status unfortunately. I just removed it and
recreated
it over the new instance.

Kind regards

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2022-05-19 20:47:56 Re: BUG #17487: Parallel execution fails when original user is removed
Previous Message PG Bug reporting form 2022-05-19 18:46:41 BUG #17487: Parallel execution fails when original user is removed