Skip site navigation (1) Skip section navigation (2)

Re: Sampling profiler updated

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sampling profiler updated
Date: 2009-07-18 20:09:01
Message-ID: 7E9C11D8-7471-48A5-A22D-F587D53CFB78@hi-media.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hi,

Le 14 juil. 09 à 11:47, Itagaki Takahiro a écrit :
> I updated Sampling profiler patch to be applied to HEAD cleanly.

Which I confirm, as I just spent some time to reviewing the patch (it  
was left unassigned in the commit fest). Reviewing code didn't strike  
anything so obvious that I'd notice...

> Basic concept of the patch is same as DTrace probes:
> Call pgstat_push_condition(condition) before a operation and call
> pgstat_pop_condition() at the end of the operation. Those functions
> should be light-weight because they only change a variable on shared
> memory without any locks.


...except that the typical integration is like this:

+ 	pgstat_push_condition(PGCOND_XLOG_OPEN);
   	fd = BasicOpenFile(path, O_RDWR | PG_BINARY |  
get_sync_bit(sync_method),
   					   S_IRUSR | S_IWUSR);
+ 	pgstat_pop_condition();

And we have this:

! void
! pgstat_push_condition(PgCondition condition)
! {
! 	volatile PgBackendStatus *beentry = MyBEEntry;
! 	PgCondition	prev;
!
! 	if (profiling_interval <= 0 || !beentry)
! 		return;

I'm wondering if it'll be enough for people not interested into  
profiling not to bother. The patch contains a lot of added call sites.  
I'm not sure if it's possible to arrange for not calling the function  
at all when profiling is disabled (GUC profiling_interval = 0).

On the same vein:

+ static void
+ LockPageContent(volatile BufferDesc *buf, LWLockMode mode)
+ {
+ 	pgstat_push_condition(PGCOND_LWLOCK_PAGE);
+ 	LWLockAcquire(buf->content_lock, mode);
+ 	pgstat_pop_condition();
+ }
+
+ static void
+ LockPageIO(volatile BufferDesc *buf, LWLockMode mode)
+ {
+ 	pgstat_push_condition(PGCOND_LWLOCK_IO);
+ 	LWLockAcquire(buf->io_in_progress_lock, mode);
+ 	pgstat_pop_condition();
+ }

With the call site being (in src/backend/storage/buffer/bufmgr.c,  
FlushRelationBuffers(Relation rel), FlushDatabaseBuffers(Oid dbid)):
! 			LWLockAcquire(bufHdr->content_lock, LW_SHARED);
...
! 			LockPageContent(bufHdr, LW_SHARED);

Maybe there's nothing to worry about, but I figured I'd better raise  
the concert for further reviewing.

Oh, and this too:
*************** LockBuffer(Buffer buffer, int mode)
*** 2372,2380 ****
   	if (mode == BUFFER_LOCK_UNLOCK)
   		LWLockRelease(buf->content_lock);
   	else if (mode == BUFFER_LOCK_SHARE)
! 		LWLockAcquire(buf->content_lock, LW_SHARED);
   	else if (mode == BUFFER_LOCK_EXCLUSIVE)
! 		LWLockAcquire(buf->content_lock, LW_EXCLUSIVE);


Now the build went fine, but the testing (default configuration) not  
so much:

dim=# create table series(i integer);
dim=# insert into series select generate_series(1, 10000000);
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter  
"checkpoint_segments".
WARNING:  condition stack overflow: 10
...
WARNING:  condition stack overflow: 11
WARNING:  condition stack overflow: 11
WARNING:  condition stack overflow: 11
ERROR:  canceling statement due to user request

dim=# select count(*) from series;
  count
-------
      0
(1 row)

Time: 9504.624 ms

dim=# select * from pg_profiles;
  profid |      profname       | profnum
--------+---------------------+---------
   83000 | LWLock:Page         |      15
   58000 | Data:Extend         |       7
   80018 | LWLock:BgWriterComm |       1
   10000 | CPU                 |      85
   22000 | Network:Send        |     128
   80009 | LWLock:WALWrite     |       6
   55000 | Data:Read           |       1
   45000 | XLog:Write          |       1
   15000 | CPU:Utility         |       5
   15100 | CPU:Commit          |       1
   57000 | Data:Write          |      15
   42000 | XLog:Insert         |      24
   46000 | XLog:Flush          |       4
(13 rows)

Time: 11.372 ms

The error I got is matching this part of the patch:
! void
! pgstat_push_condition(PgCondition condition)
! {
...
! 	if (condition_stack_top < MAX_COND_STACK)
! 		condition_stack[condition_stack_top] = prev;
! 	else
! 		elog(WARNING, "condition stack overflow: %d", condition_stack_top);


So I'm going to change patch state to "Returned with Feedback" as I  
guess we'll need to talk about the issue and find a way to solve it,  
and I don't think this state prevent from getting back to the patch in  
this same fest.

Regards,
-- 
dim

In response to

Responses

pgsql-hackers by date

Next:From: Robert HaasDate: 2009-07-18 20:38:20
Subject: Re: Sampling profiler updated
Previous:From: Robert HaasDate: 2009-07-18 20:01:58
Subject: Re: SE-PostgreSQL?

Privacy Policy | About PostgreSQL
Copyright © 1996-2013 The PostgreSQL Global Development Group