Re: [BUGS] Routine analyze of single column prevents standard autoanalyze from running at all

From: Tomasz Ostrowski <tometzky+pg(at)ato(dot)waw(dot)pl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Josh berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] Routine analyze of single column prevents standard autoanalyze from running at all
Date: 2016-06-08 06:19:58
Message-ID: 3eebdfc2-95dc-9ede-7467-47a3c26b8892@ato.waw.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

W dniu 2016-06-08 o 05:04, Tom Lane pisze:
> Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> writes:
>> Is there any significant advantage to not analyzing all columns? Only
>> case I can think of is if you have a fair number of columns that have
>> been toasted; otherwise I'd think IO would completely swamp any other
>> considerations.
>
> Yeah, my guess is that the OP's example where analyzing just one column
> was significantly cheaper boiled down to some of the other columns being
> mostly toasted data. Otherwise it's hard to see how there's much more
> expense in analyzing them all.

Actually no - this volatile column has smaller "statistics" than most of
the table, so analyzing it is much faster when it's data is not in RAM.
Here is a small exaggerated example showing a difference:

$ psql
tometzky=> create table test (id serial, data text);
tometzky=> insert into test(data)
select 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. '
||generate_series(0,10000000)::text;
tometzky=> alter table test alter column id set statistics 10;
tometzky=> alter table test alter column data set statistics 1000;
tometzky=> \q

# Drop OS page cache and restart postgres
# so the table data won't be in RAM anymore:
$ sudo sh -c 'echo 1 >/proc/sys/vm/drop_caches'
$ sudo systemctl restart postgresql;

# Test single column analyze:
$ psql
tometzky=> \timing
Timing is on.
tometzky=> analyze verbose test(id);
INFO: analyzing "public.test"
INFO: "test": scanned 3000 of 123457 pages, containing 243000 live rows
and 0 dead rows; 3000 rows in sample, 2712238 estimated total rows
ANALYZE
Time: 422,521 ms
tometzky=> \q

# Drop OS page cache and restart postgres again
$ sudo sh -c 'echo 1 >/proc/sys/vm/drop_caches'
$ sudo systemctl restart postgresql;

$ psql
tometzky=> \timing
Timing is on.
tometzky=> analyze verbose test;
INFO: analyzing "public.test"
INFO: "test": scanned 123457 of 123457 pages, containing 10000001 live
rows and 0 dead rows; 300000 rows in sample, 10000001 estimated total rows
ANALYZE
Time: 9447,519 ms

--
Tomasz "Tometzky" Ostrowski

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Emiel Hermsen 2016-06-08 08:39:14 Re: Case in Order By Ignored without warning or error
Previous Message Peter Geoghegan 2016-06-08 04:23:58 Re: BUG #14150: Attempted to delete invisible tuple

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-06-08 06:26:15 Re: slower connect from hostnossl clients
Previous Message Masahiko Sawada 2016-06-08 06:15:33 Re: Reviewing freeze map code