Re: perf pb solved only after pg_dump and restore

From: Guillaume Cottenceau <gc(at)mnc(dot)ch>
To: Markus Schaber <schabi(at)logix-tt(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: perf pb solved only after pg_dump and restore
Date: 2006-08-28 13:07:33
Message-ID: 87veodxb3e.fsf@meuh.mnc.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> > We have a couple of logs files which get larger over time
> > (millions of rows). As they are log files, they can be trimmed
> > from older values.
>
> Ah, ok, you DELETEd the old rows.

Yes.

> So I assume that you never UPDATE, but only INSERT new entries and
> sometimes DELETE a big bunch of entries from the beginning.

Actually, in the version of software where we have the problem,
that's exactly the case. But in newer versions, UPDATE come into
the picture (typically on recently inserted rows - one or two
updates per row). Does UPDATE change anything? Row selection is
done on the primary key (of SERIAL type).

> This is a special usage pattern, where the normal "VACUUM" is not well
> suited for.
>
> DELETing rows itsself does not free any space. Only after your
> transaction is committed, a following VACUUM FULL or CLUSTER does
> actually free the space.
>
> VACUUM and VACUUM ANALYZE only remove obsolete rows from the pages and
> marks them free (by entering them into the free space map, as long as
> that one is large enough). That means that your table will actually stay
> as large as before, having 90% of free pages at the beginning and 10%
> used pages at the very end. New INSERTs and UPDATEs will prefer to use
> pages from the free space map before allocating new pages, but the
> existing rows will stay forever.

Yes, that what I had in mind. But I assumed that performance
would be reclaimed (as if VACUUM FULL was run) because the
statistics after analyzing are accurate as to data distribution,
only disk space would not be reclaimed (but we don't care, at
least for the moment).

> Now, VACUUM FULL actively moves rows to the beginning of the table,
> allowing to cut the end of the table, while CLUSTER recreates the table
> from scratch, in index order. Both lead to a compact storage, having all
> used rows at the beginning, and no free pages.

I actually assumed that VACUUM ANALYZE would order rows
sequentially on disk (mainly because it was taking quite some
time and a lot of disk output activity), but obviously this was
wrong.

> So, I think, in your case VACUUM FULL and CLUSTER would both have solved
> your problem.

Ok.

> > max_fsm_pages is 20000
> > Do they look low?
> > Notice: table data is only 600M after trim (without indexes),
> > while it was probably 3x to 10x this size before the trim.
>
> 10x the size means 6G, so 5.4G of data were freed by the trim. Each page
> has 8k in size, so the fsm needs about 675000 pages. So, yes, for your
> usage, they look low, and give very suboptimal results.

"max_fsm_pages = 675000" means we also need to enlarge shared
buffers, or the shared buffers available space for data caching
would be reduced, right?

I guess the bottom line is that I don't understand what the Free
Space Map behaviour really is. Is it a map containing location of
free disk pages, free meaning that they correspond to pages
removed with DELETE but not yet released to the OS with VACUUM
FULL, which are used for INSERT in favor of enlarging the size of
data used on disk? If that's correct, am I right in assuming that
we don't care about the Free Space Map size if we perform a
VACUUM FULL right after large bunches of DELETE?

> >> have index bloat.
> >
> > Can you elaborate? I have created a couple of indexes (according
> > to multiple models of use in our application) and they do take up
> > quite some disk space (table dump is 600M but after restore it
> > takes up 1.5G on disk) but I thought they could only do good or
> > never be used, not impair performance..
>
> Like tables, indices may suffer from getting bloated by old, unused
> entries. Especially the GIST based indices in 7.4 (used by PostGIS and
> other plugins) suffered from that problem[1], but recent PostgreSQL
> versions have improved in this area.

We actually are obliged to use 7.4.5 :/

Am I correct in assuming that regularly running REINDEX would cut
this bloat? (daily)

(documentation very much insists on solving index data corruption
with REINDEX and doesn't talk much about removing old obsolete
data)

(also, is there any way to REINDEX all index of all tables
easily? as when we do just "VACUUM ANALYZE" for the whole
database)

> Now, when the query planner decides to use an index, the index access is
> extremely slow because of all the deleted entries the index scan has to
> skip.

I see.

> However, from the additional information you gave above, I doubt it was
> index bloat.

[...]

--
Guillaume Cottenceau
Create your personal SMS or WAP Service - visit http://mobilefriends.ch/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Markus Schaber 2006-08-28 13:13:48 Re: perf pb solved only after pg_dump and restore
Previous Message Guillaume Cottenceau 2006-08-28 12:31:47 Re: perf pb solved only after pg_dump and restore