Long paths for tablespace leads to uninterruptible hang in Windows

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Long paths for tablespace leads to uninterruptible hang in Windows
Date: 2013-10-10 13:34:59
Message-ID: CAA4eK1JxaBofxpcgLqCx9EB=m3PaXr9iFU9=V3ddDswsPZooxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

One of the user's of PostgreSQL has reported that if tablespace path
is long, it leads to hang and the hang is unbreakable.

Simple testcase to reproduce hang is:
a. initdb -D E:\WorkSpace\PostgreSQL\master\RM30253_Data\aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\db
b. Create tablespace tbs location 'E:\WorkSpace\PostgreSQL\master\Data\idb';
c. Drop tablespace tbs;

In this test path length used in 174, but I observed that hang occurs
if the length is greater than 130 (approx.)

I have tested this test on few different Windows platforms (Windows XP
32-bit, Windows 7 64bit). Hang occurs on Windows7 64bit. User has
reported it on Windows 2008 64bit.

On further analysis, I found that hang occurs in some of Windows
API(FindFirstFile, RemoveDirectroy) when symlink path
(pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these
API's. For above testcase, it will hang in path
destroy_tablespace_directories->ReadDir->readdir->FindFirstFile

I have tried using mklink /J (utility in Windows 7 and above) to
create Junction point instead of current way in pgsymlink, it still
hangs in similar way.

Some of the ways to resolve the problem are described as below:

1. I found that if the link path is accessed as a full path during
readdir or stat, it works fine.

For example in function destroy_tablespace_directories(), the path
used to access tablespace directory is of form
"pg_tblspc/16235/PG_9.4_201309051" by using below sprintf
sprintf(linkloc_with_version_dir,
"pg_tblspc/%u/%s",tablespaceoid,TABLESPACE_VERSION_DIRECTORY);
Now when it tries to access this path it is assumed in code that
corresponding OS API will take care of considering this path w.r.t
current working directory, which is right as per specs,
however as it hangs in OS API (FindFirstFile) if path length > 130 for
symlink and if try to use full path instead of starting with
pg_tblspc, it works fine.
So one way to resolve this issue is to use full path for symbolic link
path access instead of relying on OS to use full path.

2. Resolve symbolic link to actual path in code whenever we tries to
access it using pgreadlink. It is already used in pg_basebackup.

3. One another way is to check in code (initdb and create tablespace)
to not allow path of length more than 100 or 120

Kindly let me know your suggestions regarding above approaches to
resolve the problem or if you think there can be any other better way
to address this problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mike Blackwell 2013-10-10 13:38:27 Re: Patch for reserved connections for replication users
Previous Message Antonin Houska 2013-10-10 13:32:02 Re: Backup throttling