Re: Heavily modified big table bloat even in auto vacuum is running

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Heavily modified big table bloat even in auto vacuum is running
Date: 2013-11-29 06:30:10
Message-ID: CAA4eK1Kh1OXrGo20Na=3ZiRrui7KfPseXBxJ67pAmH6CueVBng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
<haribabu(dot)kommi(at)huawei(dot)com> wrote:
> On 25 November 2013 10:43 Amit Kapila wrote:
>> On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi
>> <haribabu(dot)kommi(at)huawei(dot)com> wrote:
>> > On 19 November 2013 10:33 Amit Kapila wrote:
>> >> If I understood correctly, then your patch's main intention is to
>> >> correct the estimate of dead tuples, so that it can lead to Vacuum
>> >> cleaning the table/index which otherwise is not happening as per
>> >> configuration value (autovacuum_vacuum_threshold) in some of the
>> >> cases, also it is not reducing the complete bloat (Unpatched -
>> 1532MB
>> >> ~Patched - 1474MB), as the main reason of bloat is extra space in
>> >> index which can be reclaimed by reindex operation.
>> >>
>> >> So if above is correct then this patch has 3 advantages:
>> >> a. Extra Vacuum on table/index due to better estimation of dead
>> tuples.
>> >> b. Space reclaim due to this extra vacuum c. may be some performance
>> >> advantage as it will avoid the delay in cleaning dead tuples
>> >>
>> >> I think better way to test the patch is to see how much benefit is
>> >> there due to above (a and b points) advantages. Different values of
>> >> autovacuum_vacuum_threshold can be used to test.
>> >
>> >
>> > The performance effect of the patch is not much visible as I think
>> the
>> > analyze on the table estimates the number of dead tuples of the table
>> with some estimation.
>>
>> Yes, that seems to be the reason why you are not seeing any
>> performance benefit, but still I think this is useful optimization to
>> do, as
>> analyze updates both the livetuples and dead tuples and similarly
>> vacuum should also update both the counts. Do you see any reason
>> why Vacuum should only update live tuples and not deadtuples?
>
> As vacuum touches all the pages where the dead tuples are present. This is not the
> Same with analyzer. Because of this reason, the analyzer estimates the dead tuples also.
> With the proposed patch the vacuum also estimates the dead tuples.

Few questions about your latest patch:
a. Is there any reason why you are doing estimation of dead tuples
only for Autovacuum and not for Vacuum.

/* clear and get the new stats for calculating proper dead tuples */
pgstat_clear_snapshot();
tabentry = pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
b. In the above code, to get latest data you are first clearing
snapshot and then calling pgstat function. It will inturn perform I/O
(read of stats file) and send/receive message from stats collector
to ensure it can read latest data. I think it will add overhead
to Vacuum, especially if 'nkeep' calculated in function
lazy_scan_heap() can serve the purpose. In my simple test[1], I
observed
that value of keep can serve the purpose.

Can you please once try the test on 'nkeep' approach patch.

>> > Because of this reason not much performance improvement is not
>> visible
>> > as the missed dead tuple calculation in vacuum is covered by the
>> analyze.
>>
>> Yeah, so might be we can check once by configuring
>> analyze_threshold/scalefactor in a way that analyze doesn't get trigger
>> during your test.
>
> I ran the test for one hour with a high analyze_threshold and results are below.
>
> Auto vacuum count Bloat size
> Master 15 155MB
> Patched 23 134MB
>
> Updated test script and configuration is attached in the mail.

I just had a brief look on your test, please check if you can simplify
your script file and make the test results to come in 15~20 mins.
Don't put too much effort on it, if you can do it easily then it is okay.

[1] Simple test case to verify the value of dead tuples:
Session-1
-----------------
a. Create table t1(c1 int);
b. insert into t1 values(generate_series(1,1000));
c. delete from t1;
d. Vacuum t1;
-- here I stopped in debugger, after fetching dead tuple count first
time (line 235, vacuumlazy.c, after applying your patch) as per your
code (modified a bit so that I can get the value for Vacuum)

Session-2
-----------------
a. insert into t1 values (generate_series(1000,1500));
b. delete from t1;

Session -1
-----------------
b. Verified the value of nkeep in lazy_scan_heap(), it is 501 which is
what we expect.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-11-29 06:38:49 Re: TODO: Split out pg_resetxlog output into pre- and post-sections
Previous Message Robert Haas 2013-11-29 06:16:39 Re: logical changeset generation v6.7