Re: ANALYZE sampling is too good

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-16 22:06:08
Message-ID: CAMkU=1w6qOQadwCwVCMkGGpGptfe2R6k5Cc0JzseARt49Z5LPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 9, 2013 at 3:14 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com>wrote:

> I took a stab at using posix_fadvise() in ANALYZE. It turned out to be
> very easy, patch attached. Your mileage may vary, but I'm seeing a nice
> gain from this on my laptop. Taking a 30000 page sample of a table with
> 717717 pages (ie. slightly larger than RAM), ANALYZE takes about 6 seconds
> without the patch, and less than a second with the patch, with
> effective_io_concurrency=10. If anyone with a good test data set loaded
> would like to test this and post some numbers, that would be great.

Performance is often chaotic near transition points, so I try to avoid data
sets that are slightly bigger or slightly smaller than RAM (or some other
limit).

Do you know how many io channels your SSD has (or whatever the term of art
is for SSD drives)?

On a RAID with 12 spindles, analyzing pgbench_accounts at scale 1000 (13GB)
with 4 GB of RAM goes from ~106 seconds to ~19 seconds.

However, I'm not sure what problem we want to solve here. I certainly
would not wish to give a background maintenance process permission to
confiscate my entire RAID throughput for its own operation. Perhaps this
could only be active for explicit analyze, and only if vacuum_cost_delay=0?

Perhaps there should be something like "alter background role autovac set
...". Otherwise we are going to end up with an "autovacuum_*" shadow
parameter for many of our parameters, see "autovacuum_work_mem" discussions.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2013-12-16 22:06:54 Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Previous Message Robert Haas 2013-12-16 21:50:30 Re: pg_rewarm status