Re: [HACKERS] tables > 1 gig

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Inoue(at)tpf(dot)co(dot)jp
Subject: Re: [HACKERS] tables > 1 gig
Date: 1999-06-17 16:03:37
Message-ID: 199906171603.MAA27337@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> writes:
> >> I think what we ought to do is finish working out how to make mdtruncate
> >> safe for concurrent backends, and then do it. That's the right
> >> long-term answer anyway.
>
> > Problem is, no one knows how right now. I liked unlinking every
> > segment, but was told by Hiroshi that causes a problem with concurrent
> > access and vacuum because the old backends still think it is there.
>
> I haven't been paying much attention, but I imagine that what's really
> going on here is that once vacuum has collected all the still-good
> tuples at the front of the relation, it doesn't bother to go through
> the remaining blocks of the relation and mark everything dead therein?
> It just truncates the file after the last block that it put tuples into,
> right?
>
> If this procedure works correctly for vacuuming a simple one-segment
> table, then it would seem that truncation of all the later segments to
> zero length should work correctly.
>
> You could truncate to zero length *and* then unlink the files if you
> had a mind to do that, but I can see why unlink without truncate would
> not work reliably.

That seems like the issue. The more complex problem is that when the
relation lookes a segment via vacuum, things go strange on the other
backends. Hiroshi seems to have a good testbed for this, and I thought
it was fixed, so I didn't notice.

Unlinking allows other backends to keep their open segments of the
tables, but that causes some problems with backends opening segments
they think still exist and they can't be opened.

Truncating segments causes problems because backends are still accessing
their own copies of the tables, and truncate modified what is seen in
their open file descriptors.

We basically have two methods, and both have problems under certain
circumstances. I wonder if we unlink the files, but then create
zero-length segments for the ones we unlink. If people think that may
fix the problems, it is easy to do that, and we can do it atomically
using the rename() system call. Create the zero-length file under a
temp name, then rename it to the segment file name. That may do the
trick of allowing existing file descriptors to stay active, while having
segments in place for those that need to see them.

Comments?

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Samersoff 1999-06-17 16:03:50 Installation procedure wishes
Previous Message Tom Lane 1999-06-17 15:53:49 Re: [HACKERS] tables > 1 gig