Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

From: "Dave Held" <dave(dot)held(at)arrayservicesgrp(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
Date: 2005-04-25 16:15:22
Message-ID: 49E94D0CFCD4DB43AFBA928DDD20C8F9026184D7@asg002.asg.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: Monday, April 25, 2005 10:23 AM
> To: Simon Riggs
> Cc: josh(at)agliodbs(dot)com; Greg Stark; Marko Ristola; pgsql-perform;
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks
> suggested?
>
> [...]
> It's not just the scan --- you also have to sort, or something like
> that, if you want to count distinct values. I doubt anyone is really
> going to consider this a feasible answer for large tables.

How about an option to create a stat hashmap for the column
that maps distinct values to their number of occurrences? Obviously
the map would need to be updated on INSERT/DELETE/UPDATE, but if the
table is dominated by reads, and an accurate n_distinct is very
important, there may be people willing to pay the extra time and space
cost.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East, Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-04-25 16:34:00 Re: Continue transactions after errors in psql
Previous Message Tom Lane 2005-04-25 15:23:00 Re: [HACKERS] Bad n_distinct estimation; hacks suggested?