Re: [HACKERS] More detail on settings for pgavd?

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>, pgsql-performance(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] More detail on settings for pgavd?
Date: 2003-11-21 17:09:00
Message-ID: 200311210909.00978.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Matthew,

> As long as pg_autovacuum remains a contrib module, I don't think any
> changes to the system catelogs will be make. If pg_autovacuum is
> deemed ready to move out of contrib, then we can talk about the above.

But we could create a config file that would store stuff in a flatfile table,
OR we could add our own "system table" that would be created when one
"initializes" pg_avd.

Just an idea. Mind you, I'm not so sure that we want to focus immediately on
per-table settings. I think that we want to get the "automatic" settings
working fairly well first; a lot of new DBAs would use the per-table settings
to shoot themselves in the foot. So we need to be able to make a strong
recommendation to "try the automatic settings first."

> Are you saying that you can vacuum a 1 million row table in 2-4
> minutes? While a vacuum of the same table with an additional 1 million
> dead tuples would take an hour?

I'm probably exaggerating. I do know that I can vacuum a fairly clean 1-5
million row table in less than 4 mintues. I've never let such a table get
to 50% dead tuples, so I don't really know how long that takes. Call me a
coward if you like ...

> >I'd be really reluctant to base pv-avd frequency on the fsm settings
> > instead. What if the user loads 8GB of data but leaves fsm_pages at the
> > default of 10,000? You can't do much with that; you'd have to vacuum if
> > even 1% of the data changed.
>
> Ok, but as you said above it's very easy to set the FSM once you know
> your db size.

Actually, thinking about this I realize that PG_AVD and the Perl-based
postgresql.conf configuration script I was working on (darn, who was doing
that with me?) need to go togther. With pg_avd, setting max_fsm_pages is
very easy; without it its a bit of guesswork.

So I think we can do this: for 'auto' settings:

If max_fsm_pages is between 13% and 100% of the total database pages, then set
the vacuum scale factor to match 3/4 of the fsm_pages setting, e.g.
database = 18,000,000 data pages;
max_fsm_pages = 3,600,000;
set vacuum scale factor = 3.6mil/18mil * 3/4 = 0.15

If max_fsm_pages is less than 13% of database pages, issue a warning to the
user (log it, if possible) and set scale factor to 0.1. If it's greater
than 100% set it to 1 and leave it alone.

> I don't have a good plan as to how to incorporate
> this data, but to a large extent the FSM already tracks table activity
> and gives us the most accurate answer about storage growth (short of
> using something like contrib/pgstattuple which takes nearly the same
> amount of time as an actual vacuum)

I don't really think we need to do dynamic monitoring at this point. It
would be a lot of engineering to check data page pollution without having
significant performance impact. It's doable, but something I think we
should hold off until version 3. It would mean hacking the FSM, which is a
little beyond me right now.

> In my testing, I never changed the default statistics settings.

Ah. Well, a lot of users do to resolve query problems.

> But we track tuples because we can compare against the count given by
> the stats system. I don't know of a way (other than looking at the FSM,
> or contrib/pgstattuple ) to see how many dead pages exist.

No, but for scaling you don't need the dynamic count of tuples or of dead
tuples; pg_class holds a reasonable accurate count of pages per table as of
last vacuum.

--
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kurt Roeckx 2003-11-21 17:25:51 Re: 7.4 logging bug.
Previous Message Andreas Pflug 2003-11-21 16:46:55 Re: logical column position

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2003-11-21 21:23:18 Re: [HACKERS] More detail on settings for pgavd?
Previous Message Matthew T. O'Connor 2003-11-21 15:17:31 Re: [HACKERS] More detail on settings for pgavd?