Re: TABLESAMPLE patch

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: TABLESAMPLE patch
Date: 2015-01-10 19:59:15
Message-ID: 54B18493.5030303@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/01/15 09:27, Michael Paquier wrote:
>
> Some comments about the 1st patch:
> 1) Nitpicking:
> + * Block sampling routines shared by ANALYZE and TABLESAMPLE.
> TABLESAMPLE is not added yet. You may as well mention simplify this
> description with something like "Sampling routines for relation
> blocks".
>

Changed.

> 2) file_fdw and postgres_fdw do not compile correctly as they still
> use anl_random_fract. This function is still mentioned in vacuum.h as
> well.

Gah, didn't notice this, fixed. And also since the Vitter's reservoir
sampling methods are used by other component than just analyze, I moved
those to sampling.c/h.

>
> 3) Not really an issue of this patch, but I'd think that this comment
> should be reworked:
> + * BlockSampler is used for stage one of our new two-stage tuple
> + * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
> + * "Large DB"). It selects a random sample of samplesize blocks out of
> + * the nblocks blocks in the table. If the table has less than
> + * samplesize blocks, all blocks are selected.
>

I changed the wording slightly, it's still not too great though.

> 4) As a refactoring patch, why is the function providing a random
> value changed? Shouldn't sample_random_fract be consistent with
> anl_random_fract?
>

Yeah I needed that for TABLESAMPLE and it should not really affect the
randomness but you are correct it should be part of second patch of the
patch-set.

> 5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and
> RAND48_SEED_2 instead of being hardcoded?
> Regards,
>

Removed this part from the first patch as it's not needed there anymore.

In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code
again I realized the old one would produce wrong results if there were
multiple TABLESAMPLE statements in same query or multiple cursors in
same transaction.

In addition to the above changes I added test for cursors and test for
the issue with random generator I mentioned above. Also fixed some typos
in comments and function name. And finally I added note to docs saying
that same REPEATABLE might produce different results in subsequent queries.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-separate-block-sampling-functions-v2.patch text/x-diff 21.9 KB
0002-tablesample-v5.patch text/x-diff 98.9 KB
0003-tablesample-ddl-v2.patch text/x-diff 47.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2015-01-10 21:09:28 Re: libpq 9.4 requires /etc/passwd?
Previous Message Tom Lane 2015-01-10 19:53:13 Sloppy SSPI error reporting code