Quick Links

Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
Date:	2015-06-22 00:49:40
Message-ID:	55875BA4.1060706@BlueTreble.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 6/20/15 12:55 PM, Tomas Vondra wrote:
> Well, actually I think it would be even more appropriate for very large
> tables. With a 2.5TB table, you don't really care whether analyze
> collects 5GB or 8GB sample, the difference is rather minor compared to
> I/O generated by the other queries etc. The current sample is already
> random enough not to work well with read-ahead, and it scans only a
> slightly lower number of blocks.

Have we ever looked at generating new stats as part of a seqscan? I
don't know how expensive the math is but if it's too much to push to a
backend perhaps a bgworker could follow behind the seqscan.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H at 2015-06-20 16:55:44 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2015-06-22 01:50:17	Re: SSL TAP tests and chmod
Previous Message	Jim Nasby	2015-06-22 00:40:31	Re: Extension support for postgres_fdw