Skip site navigation (1) Skip section navigation (2)

Re: Query Sampling

From: Varun Kacholia <kacholia(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query Sampling
Date: 2005-08-29 21:58:28
Message-ID: 500f00610508291458793813f9@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
 > I assume you realise that Bernoulli sampling is currently possibly using
> the random() function and setseed() ?
Yes, select * from table where random() < x, does the job.
 
> I can't see why TABLESAMPLE effects a sequential scan *only*, in all
> cases. I agree that there seems little point in sampling rows from a
> table when it is already sufficiently restricted that the query could
> use an index.
> AFAICS this clause would potentially effect Index and Bitmap scans also,
> and would be required for full correctness to the standard.

As I see it, there are 3 ways of implementing the sample operator:
1. modify node[Seq|Tid|Index|..]scan.c to consider sampling
2. create new nodes for each of the possible scans..sequential, index, tid et al
3. support sequential scan only for sampling.

(1) does not seem to be attractive, while (2) is a lot of work to
begin with. I was
planning to start with (3) and approach to (2) in the long run.
I would appreciate your opinion on this.

> You would need to be careful to sample the whole table though, rather
> than to follow the temptation to just scan the first X% of it. The start
> of a table has a tendency to be dead tuples, which was an error that the
> sampling logic in 7.4 made, so it would be wise to avoid repeating that.
Correct. I have that in mind.
 
> I'd be willing to lend a hand over the coming year - since 8.1 just went
> beta we can expect a good few months before the next code deadline.
Great! I would appreciate an extra hand.

Thanks
Varun 

>

In response to

Responses

pgsql-hackers by date

Next:From: Martijn van OosterhoutDate: 2005-08-29 22:56:21
Subject: Simple tester for MVCC in PostgreSQL
Previous:From: Mary Edie MeredithDate: 2005-08-29 21:36:37
Subject: Re: data on devel code perf dip

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group