Re: BUG #13561: "could not write to log file, Bad file descriptor" error when working with huge table

From: "Pallier, Manuel / BEKO Graz" <Manuel(dot)Pallier(at)beko(dot)at>
To: Venkata Balaji N <nag1010(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13561: "could not write to log file, Bad file descriptor" error when working with huge table
Date: 2015-08-12 05:31:59
Message-ID: 6AAA19FB3874DA43A53E151C03F40D0B06FEC3A4@EAMSGBEK003.bekoag.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

> Could you help us know the size of the database you are dumping ?
The size of the data directory is 3.37 GB. The size of the dump file with --compress=9 is 300 MB.

> This should probably be a Windows memory allocation problem. Do you see any such issues like corruption with memory or hard-disk ?
No, all other operations on the machine works fine, including copying large files and installation of software. But I can’t do real memory or hard-disk tests because this is just a virtual machine and I don’t have any access to the physical machine.

> Assuming all is well with the hardware, If you can try and increase the shared_buffers and see if that improves the situation.
As written, I’ve already increased shared_buffers to 128MB and it indeed improved the situation. But it isn’t resolved. What value would you suggest?

Best regards,
Manuel

Von: Venkata Balaji N [mailto:nag1010(at)gmail(dot)com]
Gesendet: Mittwoch, 12. August 2015 02:48
An: Pallier, Manuel / BEKO Graz
Betreff: Re: [BUGS] BUG #13561: "could not write to log file, Bad file descriptor" error when working with huge table

On Tue, Aug 11, 2015 at 3:41 PM, <manuel(dot)pallier(at)beko(dot)at<mailto:manuel(dot)pallier(at)beko(dot)at>> wrote:
The following bug has been logged on the website:

Bug reference: 13561
Logged by: Manuel Pallier
Email address: manuel(dot)pallier(at)beko(dot)at<mailto:manuel(dot)pallier(at)beko(dot)at>
PostgreSQL version: 9.4.4
Operating system: Windows Server 2008 SP2 x64
Description:

I'm having a big problem with a PostreSQL database on a specific machine.
Whenever I do some specific things with a huge table (over 9 million rows)
the whole postgres process crashes with the following entries in the log
file:

PANIC: could not write to log file 000000010000000000000022 at offset
11673600, length 909312: Bad file descriptor
LOG: WAL writer process (PID 17056) exited with exit code 3
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
CONTEXT: COPY fund_records, line 2088885: "..."
LOG: PID 31820 in cancel request did not match any process
LOG: all server processes terminated; reinitializing
FATAL: pre-existing shared memory block is still in use
HINT: Check if there are any old server processes still running, and
terminate them.

The `CONTEXT` is of course differnt depending on what I did. The above
`COPY` command was logged when restoring a dump with the big table with
pg_restore. Other occurrences were `VACUUM` (when I run vacuum on either the
single big table or the whole database) or even `SELECT` (when I run a
simple 1 row result select joining the big table).

I can reproduce this crash 100% by either running vacuum on the table or
trying to restore the table data from a dump. Other things, like select
statements, do not reproduce always, but at least they crash often enough
that it is very annoying. It's always the `could not write to log file x at
offset y, length z: Bad file descriptor` error. I've already tried disabling
autovacuum during pg_resotre but it still fails.

This problem seems to be specific to one server, because the same table/dump
works fine when I import it on my local machine. The affected server is a
virtual (on Parallels Virtuozzo base) Windows Server 2008 SP2 x64 with 2 GB
RAM. There is only a single HDD partition and nearly 40 GB of free space on
it. Nothing other than PostgreSQL is running on the server, also no anti
virus software. But PostgreSQL is the only application on this server that
shows problems, all other operations I've tried work fine, so it seems like
it's not a general problem with this machine.

I've tried PostgreSQL server versions 8.4.4 (this was running for years
until it first showed the error), 8.4.22 and 9.4.4, all show the same error.
I've tried both upgrading from 8.4.4 to 8.4.22 and fresh installs of 8.4.22
and 9.4.4 (there I've also completely deleted the data folder) multiple
times.

Since I've changed some values in postgresql.conf away from their default
values, the the crash now happens much rarer. Simple SQL and DML statements
only crash very rarely (once every few days), but VACUUM still crashes 100%.
Here are the values I changed:

shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
wal_buffers = 4MB

Can someone please explain what's wrong here?

Could you help us know the size of the database you are dumping ? This should probably be a Windows memory allocation problem. Do you see any such issues like corruption with memory or hard-disk ?
Assuming all is well with the hardware, If you can try and increase the shared_buffers and see if that improves the situation.
Regards,
Venkata Balaji N
Fujitsu Australia

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Noah Misch 2015-08-12 05:57:19 Re: BUG #13427: postgres.exe fails to start on Korean Windows Server 2008: cannot perform encoding conversion outsid
Previous Message navaldeep.sandhu 2015-08-11 15:25:52 BUG #13566: postgis_21_sample won't install using the stack builder