Re: BUG #15427: DROP INDEX did not free up disk space

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: ap(at)zip(dot)com(dot)au, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15427: DROP INDEX did not free up disk space
Date: 2018-10-12 04:51:48
Message-ID: 20181012045148.rhohmjjy7ehrczsi@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2018-10-12 00:33:14 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2018-10-11 23:57:16 -0400, Tom Lane wrote:
> >> Uh, what's that got to do with it?
>
> > If you look at the bugreport: As soon as the op, on my suggestion,
> > triggered sinval processing (by issuing a SELECT 1;) the space was
> > freed. So clearly the open FDs were part of the problem.
>
> TBH, I think the space-freeup was more likely driven off a background
> checkpoint completion, which is where the truncation happens.

Uh, as I wrote, mdunlinkfork(), which backs dropping an index via
index_drop()->RelationDropStorage() and then
smgrDoPendingDeletes()->smgrdounlinkall()->mdunlink()->mdunlinkfork(),
unlinks all segments beyond the first itself:

static void
mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)
{
char *path;
int ret;

path = relpath(rnode, forkNum);

/*
* Delete or truncate the first segment.
*/
if (isRedo || forkNum != MAIN_FORKNUM || RelFileNodeBackendIsTemp(rnode))
{
ret = unlink(path);
if (ret < 0 && errno != ENOENT)
ereport(WARNING,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m", path)));
}
else
{
/* truncate(2) would be easier here, but Windows hasn't got it */
int fd;

fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
if (fd >= 0)
{
int save_errno;

ret = ftruncate(fd, 0);
save_errno = errno;
CloseTransientFile(fd);
errno = save_errno;
}
else
ret = -1;
if (ret < 0 && errno != ENOENT)
ereport(WARNING,
(errcode_for_file_access(),
errmsg("could not truncate file \"%s\": %m", path)));

/* Register request to unlink first segment later */
register_unlink(rnode);
}

/*
* Delete any additional segments.
*/
if (ret >= 0)
{
char *segpath = (char *) palloc(strlen(path) + 12);
BlockNumber segno;

/*
* Note that because we loop until getting ENOENT, we will correctly
* remove all inactive segments as well as active ones.
*/
for (segno = 1;; segno++)
{
sprintf(segpath, "%s.%u", path, segno);
if (unlink(segpath) < 0)
{
/* ENOENT is expected after the last segment... */
if (errno != ENOENT)
ereport(WARNING,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m", segpath)));
break;
}
}
pfree(segpath);
}

pfree(path);
}

As you clearly can see, unlink() is called directly here for anything
but the first segment (which is registered to be unlinked in
checkpointer via register_unlink()).

Note that the checkpointer based machinery doesn't even *support*
unlinking anything beyond the first segment:

void
mdpostckpt(void)
{
...
while (pendingUnlinks != NIL)
...
/* Unlink the file */
path = relpathperm(entry->rnode, MAIN_FORKNUM);
if (unlink(path) < 0)

there's no segment handling here.

You're right that mdtruncate() leaves later segments around in a
truncated manner. But that's because of an orthogonal concern:
* The full and partial segments are collectively the "active" segments.
* Inactive segments are those that once contained data but are currently
* not needed because of an mdtruncate() operation. The reason for leaving
* them present at size zero, rather than unlinking them, is that other
* backends and/or the checkpointer might be holding open file references to
* such segments. If the relation expands again after mdtruncate(), such
* that a deactivated segment becomes active again, it is important that
* such file references still be valid --- else data might get written
* out to an unlinked old copy of a segment file that will eventually
* disappear.

that doesn't apply to dropping relations.

Greetings,

Andres Freund

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Abhishek Tripathi 2018-10-12 05:03:03 Re: Want to acquire lock on tables where primary of one table is foreign key on othere
Previous Message Tom Lane 2018-10-12 04:33:14 Re: BUG #15427: DROP INDEX did not free up disk space