Re: Large number of open(2) calls with bulk INSERT into empty table

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Weimer <fweimer(at)bfk(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Large number of open(2) calls with bulk INSERT into empty table
Date: 2011-11-30 18:30:58
Message-ID: CA+Tgmoac+6qTNp2U+wedY8-PU6kK_b6hbdhR5xYGBG3GtdFcww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 30, 2011 at 12:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Nov 27, 2011 at 10:24 AM, Florian Weimer <fweimer(at)bfk(dot)de> wrote:
>>> I noticed that a bulk INSERT into an empty table (which has been
>>> TRUNCATEd in the same transaction, for good measure) results in a
>>> curious number of open(2) calls for the FSM resource fork:
>
>> That's kind of unfortunate.  It looks like every time we extend the
>> relation, we try to read the free space map to see whether there's a
>> block available with free space in it.  But since we never actually
>> make any entries in the free space map, the fork never gets created,
>> so every attempt to read it involves a system call to see whether it's
>> there.
>
> I wonder whether it'd help if we went ahead and created the FSM file,
> with length zero, as soon as the relation is made (or maybe when it
> first becomes of nonzero length).  That would at least save the failed
> open()s.  We'd still be doing lseeks on the FSM file, but those ought
> to be cheaper.
>
> A less shaky way to do it would be to just create the first page of the
> FSM file immediately, but that would represent an annoying percentage
> increase in the disk space needed for small tables.

Well, unfortunately, we're not really doing a good job dodging that
problem as it is. For example:

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# select pg_relation_size('foo'), pg_table_size('foo');
pg_relation_size | pg_table_size
------------------+---------------
0 | 0
(1 row)

rhaas=# insert into foo values (1);
INSERT 0 1
rhaas=# select pg_relation_size('foo'), pg_table_size('foo');
pg_relation_size | pg_table_size
------------------+---------------
8192 | 8192
(1 row)

rhaas=# vacuum foo;
VACUUM
rhaas=# select pg_relation_size('foo'), pg_table_size('foo');
pg_relation_size | pg_table_size
------------------+---------------
8192 | 40960
(1 row)

rhaas=#

Yikes! A table with 4 bytes of useful data is consuming 40kB on disk
- 8kB in the main form, 8kB in the VM fork, and 24kB in the FSM fork.
Ouch!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2011-11-30 18:34:07 Re: review: CHECK FUNCTION statement
Previous Message Tom Lane 2011-11-30 18:30:03 Re: review: CHECK FUNCTION statement