Re: Gsoc2012 idea, tablesample

From: Florian Pflug <fgp(at)phlo(dot)org>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: <josh(at)agliodbs(dot)com>, <andres(at)anarazel(dot)de>, <alvherre(at)commandprompt(dot)com>, <ants(at)cybertec(dot)at>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <cbbrowne(at)gmail(dot)com>, <neil(dot)conway(at)gmail(dot)com>, <robertmhaas(at)gmail(dot)com>, <daniel(at)heroku(dot)com>, <huangqiyx(at)hotmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, <sfrost(at)snowman(dot)net>
Subject: Re: Gsoc2012 idea, tablesample
Date: 2012-05-11 15:55:24
Message-ID: 9C5FE8E4-488B-4598-B5F6-A70AF520B6AD@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On May11, 2012, at 16:03 , Kevin Grittner wrote:
>> [more complex alternatives]
>
> I really think your first suggestion covers it perfectly; these more
> complex techniques don't seem necessary to me.

The point of the more complex techniques (especially the algorithm in
my second mail, the "reply to self") was simply to optimize the generation
of a random, uniformly distributed, unique and sorted list of TIDs.

The basic idea is to make sure we generate the TIDs in physical order,
and thus automatically ensure that they are unique. The reduces the memory
(or disk) requirement to O(1) instead of O(n), and (more importantly,
actually) makes the actual implementation much simpler.

best regards,
Florian Pflug

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2012-05-11 16:13:09 Re: incorrect handling of the timeout in pg_receivexlog
Previous Message Fujii Masao 2012-05-11 15:53:57 Re: Can pg_trgm handle non-alphanumeric characters?