Skip site navigation (1) Skip section navigation (2)

Re: patch review : Add ability to constrain backend temporary file space

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch review : Add ability to constrain backend temporary file space
Date: 2011-06-01 00:27:46
Message-ID: 4DE58782.2010001@catalyst.net.nz (view raw or flat)
Thread:
Lists: pgsql-hackers
On 01/06/11 09:24, Cédric Villemain wrote:
> Hello
>
> here is a partial review of your patch, better than keeping it
> sleeping in the commitfest queue I hope.
>
>   Submission review
> ================
>
>      * The patch is not in context diff format.
>      * The patch apply, but contains some extra whitespace.
>      * Documentation is here but not explicit about 'temp tables',
> maybe worth adding that this won't limit temporary table size ?
>      * There is no test provided. One can be expected to check that the
> feature work.
>
> Code review
> =========
>
> * in fd.c, I think that "temporary_files_size -=
> (double)vfdP->fileSize;" should be done later in the function once we
> have successfully unlink the file, not before.
>
> * I am not sure it is better to add a fileSize like you did or use
> relationgetnumberofblock() when file is about to be truncated or
> unlinked, this way the seekPos should be enough to increase the global
> counter.
>
> * temporary_files_size, I think it is better to have a number of pages
> à la postgresql than a kilobyte size
>
> * max_temp_files_size, I'll prefer an approach like shared_buffers
> GUC: you can use pages, or KB, MB, ...
>
>
> Simple Feature test
> ==============
>
> either explain buffers is wrong or the patch is wrong:
> cedric=# explain (analyze,buffers) select * from foo  order by 1 desc ;
>                                                     QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
>   Sort  (cost=10260.02..10495.82 rows=94320 width=4) (actual
> time=364.373..518.940 rows=100000 loops=1)
>     Sort Key: generate_series
>     Sort Method: external merge  Disk: 1352kB
>     Buffers: local hit=393, temp read=249 written=249
>     ->   Seq Scan on foo  (cost=0.00..1336.20 rows=94320 width=4)
> (actual time=0.025..138.754 rows=100000 loops=1)
>           Buffers: local hit=393
>   Total runtime: 642.874 ms
> (7 rows)
>
> cedric=# set max_temp_files_size to 1900;
> SET
> cedric=# explain (analyze,buffers) select * from foo  order by 1 desc ;
> ERROR:  aborting due to exceeding max temp files size
> STATEMENT:  explain (analyze,buffers) select * from foo  order by 1 desc ;
> ERROR:  aborting due to exceeding max temp files size
>
> Do you have some testing method I can apply to track that without
> explain (analyze, buffers) before going to low-level monitoring ?
>
> Architecture review
> ==============
>
> max_temp_files_size is used for the global space used per backend.
> Based on how work_mem work I expect something like "work_disk" to
> limit per file and maybe a backend_work_disk (and yes maybe a
> backend_work_mem ?!) per backend.
> So I propose to rename the current GUC to something like backend_work_disk.
>
> Patch is not large and easy to read.
> I like the idea and it sounds useful.
>

Hi Cédric,

Thanks for reviewing!

The diff format is odd - I'm sure I told git to do a context diff, 
however running 'file' on the v2 patch results it it saying 'HTML 
document'... hmm, I'll check more carefully for the next one I do.

Yeah, I agree about explicitly mentioning how it does not constraint 
temp tables.

Hmm, test - good idea - I'll look into it.

Re the code comments - I agree with most of them. However with respect 
to the Guc units, I copied the setup for work_mem as that seemed the 
most related. Also I used bytes for the internal variable in fd.c to 
limit the number of arithmetic operations while comparing.

For testing I set 'log_temp_files = 0' in postgresql.conf and the 
numbers seemed to agree exactly - I didn't think to use EXPLAIN + 
buffers... not sure why they would disagree. Have a go with 
log_temp_files set and see what you find!

I like yhour suggesting for the name. Given that 'work_mem' is per 
backend, I'm leaning towards the idea of 'work_disk' being sufficiently 
descriptive.

Thanks again, and I'll come up with a v3 patch.

Mark



In response to

Responses

pgsql-hackers by date

Next:From: Craig RingerDate: 2011-06-01 00:29:12
Subject: Re: BUG #6046: select current_date crashes postgres
Previous:From: Thom BrownDate: 2011-06-01 00:18:18
Subject: Re: creating CHECK constraints as NOT VALID

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group