Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mike Wilson <mfwilson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-25 17:01:13
Message-ID: 20120725170113.GA22286@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Jul 19, 2012 at 02:24:36AM -0400, Bruce Momjian wrote:
> On Wed, Jul 18, 2012 at 01:30:34AM -0400, Tom Lane wrote:
> > I wrote:
> > > ... I'm wondering if maybe
> > > Solaris has a weird definition of struct dirent that breaks the
> > > calculation used here:
> >
> > > entrysize = sizeof(struct dirent) - sizeof(direntry->d_name) +
> > > strlen(direntry->d_name) + 1;
> >
> > Some googling suggests that on Solaris, this calculation would actually
> > give a result that's a little too *large* not too small, because of
> > padding in the declaration of struct dirent. That would not be a
> > problem so far as callers were concerned, but it's conceivable that the
> > memcpy in load_directory would run off the end of memory and crash.
> > Could that explain Mike's problem? You'd have to assume that the stack
> > trace he showed us had omitted to mention load_directory, but I don't
> > have a huge problem with making such an assumption. Anyway the offsetof
> > formulation should avoid this hazard, so I'm still interested to know
> > if that helps.
>
> Great to hear this fixed the problem. The only way I can see this
> causing a crash is:
>
> * the padding of direntry->d_name is less than the padding of
> struct dirent
> * there is a non-mapped memory range after the struct
> * the file name is near the full length of d_name
>
> Those are pretty rare odds. The only other issue is that this code
> assumes direntry->d_name is the last element of the struct, and I am not
> sure how we can assume that is true.
>
> Also, this code tries to be tricky by reducing the allocated memory
> instead of allocating the maximum path. Shouldn't there a comment
> about the propose of this code?

This bug report was fixed mostly via private email because a private
schema dump was sent to Tom and me. Tom fixed the problem by changing
the way we interact with struct dirent, but neither of us is sure
exactly why the fix worked. It is something about Solaris.

Tom suggested that load_directory() return a (char *) array, rather than
a struct dirent array, greatly simplifying the code.

I have done this in the attached patch, and because of the uncertainty
of ths fix, I would like to apply this to 9.2 as well.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pg_upgrade.diff text/x-diff 6.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2012-07-25 17:05:37 Re: BUG #6758: ./configure script sets HAVE_WCSTOMBS_L 1
Previous Message Tom Lane 2012-07-25 16:53:03 Re: BUG #6761: unexpected behaviour of 'now'::timestamp

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-07-25 19:37:09 filenames in pg_basebackup
Previous Message Christoph Berg 2012-07-25 14:04:45 Re: Notify system doesn't recover from "No space" error