Re: ANALYZE to be ignored by VACUUM

From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ANALYZE to be ignored by VACUUM
Date: 2008-02-20 16:45:14
Message-ID: 758d5e7f0802200845v7b5dde22v1bd82c49755a1a53@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Feb 19, 2008 8:31 AM, ITAGAKI Takahiro
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> > > In my workload, ANALYZE takes long time (1min at statistics_target = 10,
> > > and 5min at 100), but the updated table needs to be vacuumed every 30 seconds
> > > because seqscans run on the table repeatedly.
> >
> > There is something *seriously* wrong with that. If vacuum can complete
> > in under 30 seconds, how can analyze take a minute? (I'm also wondering
> > whether you'll still need such frantic vacuuming with HOT...)
>
> There are two tables here:
> [S] A small table, that is frequently updated and seqscan-ed
> [L] A large table, that takes a long time to be analyzed
>
> The table [S] should be vacuumed every 30 seconds, because dead tuples
> affects the performance of seqscan seriously. HOT and autovacuum are
> very useful here *unless* long transactions begins.
> Analyzing [L] takes 1-5 minutes and both HOT and vacuums don't work
> during it. I want to use statistics_target = 100 at heart for more
> accurate statistics, but I'm using 10 instead because of avoiding
> long transactions by analyze.

I am sure the idea is not original, yet still I would like to know how hard
would it be to support local (per table) oldest visible XIDs.

I mean, when transaction start you need to keep all tuples with xmin >=
oldest_xid in all tables, because who knows what table will that transaction
like to touch.

But then again, there is relatively large list of cases when we don't need
to hold vacuum on _all_ relations. These include:

SELECTs in auto-commit mode -- provided the SELECT is not something
fancy (not immutable PL-functions), we just need to keep a snapshot of
affected tables.

DMLs in auto-commit mode -- provided no PL-functions or triggers are in
effect.

WITH HOLD CURSORS. Yes, I know, WITH HOLD cursor on first COMMIT
will create a copy of rows to be returned (which can take a looong time in
some cases), but perhaps it could be possible to just "lock" the table from
vacuuming and skip the temporary store.

And lots of other, when done in auto-commit. Like ALTER TABLEs, CREATE
TABLE AS SELECT, COPY, etc...

I am sure that such an idea isn't original. What are the main obstacles
in making it happen except "time&money"? :)

Regards,
Dawid

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-02-20 16:48:16 Re: Permanent settings
Previous Message Naz Gassiep 2008-02-20 16:44:25 Re: Timezone view