Quick Links

Re: a question about relkind of RelationData handed over to heap_update function

From:	노홍찬 <fallsmal(at)cs(dot)yonsei(dot)ac(dot)kr>
To:	"'Greg Stark'" <gsstark(at)mit(dot)edu>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: a question about relkind of RelationData handed over to heap_update function
Date:	2009-10-27 01:32:12
Message-ID:	002701ca56a5$4fbbc9e0$ef335da0$@yonsei.ac.kr
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Dear Greg Stark,

Totally, right. I want to record the all updated region.
So, doing some work is not doing a little work.

But, I am trying to not touch the existing codes as much as I can.
Therefore, I mostly added my code, I didn't changed markDirtyBuffer function at all, but, of course, I have created a function that is supposed to work similarly to what you mentioned.

I am sorry that I couldn't understand the following sentence's meaning (The "some more work" may be some function call which doesn't usually do much either.).
What did you mean in that sentence? Please excuse my poor English understanding, and it would be great if you can explain the meaning more again.

Until now, it's like this, I have appended several fields to BufferDesc structure, and my own structure (IclNewLog) is used for recording those dirty regions.

------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------

typedef struct sbufdesc {
BufferTag tag; /* ID of page contained in buffer */
BufFlags flags; /* see bit definitions above */
uint16 usage_count; /* usage counter for clock sweep code */
unsigned refcount; /* # of backends holding pins on buffer */
int wait_backend_pid; /* backend PID of pin-count waiter */

slock_t buf_hdr_lock; /* protects the above fields */

int buf_id; /* buffer's index number (from 0) */
int freeNext; /* link in freelist chain */

LWLockId io_in_progress_lock; /* to wait for I/O to complete */
LWLockId content_lock; /* to lock access to buffer contents */

/* hongs added */
#ifdef USE_ICL
bool isBufferPageNewOrXlogRead;
int icl_length;
IclNewLog icl_logs[ICL_LEN_LIMIT];
#endif
/* hongs added */

} BufferDesc;

typedef struct IclNewLog {
int change_start;
int change_end;
uint32 file; //for ICL_DEBUG
int line; //for ICL_DEBUG
int icl_log_global_seq; //for ICL_DEBUG
} IclNewLog;

------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------

* a part of heap_update function *

Line number: 2761: oldtup.t_data->t_ctid = heaptup->t_self;

/* hongs added; ICL logs oldtuple's tupleheader */
#ifdef USE_ICL
if(doIcl) {
LockBufHdr(bufHdr); //buffer header lock and buffer content lock is separate, so I guess the buffer header lock is needed
if(bufHdr->icl_length < ICL_LEN_LIMIT-1) {
bufHdr->icl_logs[bufHdr->icl_length].change_start = lp->lp_off;
bufHdr->icl_logs[bufHdr->icl_length].change_end = lp->lp_off + sizeof(HeapTupleHeaderData);
bufHdr->icl_logs[bufHdr->icl_length].file = HEAPAM;
bufHdr->icl_logs[bufHdr->icl_length].line = 3003;
IclAssert( IsIclLogValid(bufHdr->icl_logs[bufHdr->icl_length]) ); //making sure of the correctness of the logsize
bufHdr->icl_length++;
}
UnlockBufHdr(bufHdr);
}
#endif
/* hongs added end */

Line number: 2762: if (newbuf != buffer)
Line number: 2763: MarkBufferDirty(newbuf);
Line number: 2764: MarkBufferDirty(buffer);
------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------

I named the log "icl log".
The above code is recording "the update to the old tuple's tuple header" into the log array field of the buffer descriptor whose buffer page is supposed to be marked dirty.

I'm not interested in the buffers frequently updated. I'm interested in the buffers to be flushed having very small amount of genuine update areas.
Since, pgsql's update policy uses MVCC time-shapshot model, so every update causes the update of old tuple's header (changing the xmax field of it).
There might be some buffer pages to be flushed which have only one or two small regions of genuine updates like updated xmax field or updated XLogRecPtr.
I think, purely in my opinion, those flush operations that have small amount of genuine update regions are inefficient.
However, it's not the only problems of pgsql, though. The in-place update operations of every DBMS have similar problems.
I think pgsql's update logic is less problematic than others,
since the main updates (not old tuple's header update but the real tuples) could be piled up in a buffer page (not in scattered pages),
and the hot-update mechanism addresses the previous problems of time-snapshot MVCC well in pgsql.

Therefore, I limited the maximum log array size as 20. If I apply some log merge logic (cuz there would be many logs which can be merged together like 8152 ~ 8172 and 8162 ~ 8192 -> 8151 ~ 8192)
, then the array size would be enough to locate the buffers having small genuine update regions. I don't care about the buffers which has logs more than the maximum log array size.

It's an example, current codes doesn't look like this.
I am trying to not touch the previous codes but only append my logic, so that later my code can be patched as an additional module for specific purpose like flash based storage.

I want to emphasize this once more, this attempt is not for the pgsql patch or pgsql enhancement but for my own research purpose, at least for now.
Besides, this try is just a preparation for my research idea to be implemented.
Therefore, if you see much of inefficiency and stupidness in this try, please understand that.
Later, when I am confident to show the total picture of my idea and working codes (at least after passing through the regression test and my own tests using dbt2-benchmark),
I'll present it to you, and hackers.

I really thank your interest in my try.

For the original query, I found my mistake. I confused relation oid with relNode (of relFileNode). Sorry for the hasty question.

Thank you for reading this.

- Best Regards
Hongchan Roh -

-----Original Message-----
From: gsstark(at)gmail(dot)com [mailto:gsstark(at)gmail(dot)com] On Behalf Of Greg Stark
Sent: Tuesday, October 27, 2009 2:22 AM
To: 노홍찬
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: a question about relkind of RelationData handed over to heap_update function

On Sun, Oct 25, 2009 at 9:37 AM, 노홍찬 <fallsmal(at)cs(dot)yonsei(dot)ac(dot)kr> wrote:
> What I am trying to do now is to examine the real dirty portion of buffer pages to be flushed like the following.
>
> page 1
> -------------
> | | dportion1 (real dirty portion 1) ranges between 20 ~ 80
> | dportion1 |
> | | dportion2 (real dirty portion 2) ranges between 8190 ~ 8192
> | |
> | dportion2 |
> -------------
>
> Since there are many different kinds of page-updates such as updates to local buffer, temp relation, indexes, toasted attributes, and so forth.
>
> It would be a big burden to me if I inspect all that codes.
>
> Therefore, I decided to make a start point as inspecting only updates to the ordinary tables.
>
> I added a log array field to BufferDesc struct, and added logs to the designated bufferDesc of the updated buffer
>
> when it comes to ordinary table updates (The logs specifies the real dirty portion ranges of the buffer).
>

I would think you would want to modify MarkBufferDirty to take a start
and end point and store that in your log. Then modify every existing
MarkBufferDirty operation that you can to specify the range that the
subsequent operation is going to modify. You're going to run into
problems where you have code which looks like:

- mark buffer dirty
- do some work which modifies a predictable portion
- if (some rare condition)
- do some more work which modifies other parts of the buffer

The "some more work" may be some function call which doesn't usually
do much either.

So you may end up having to restructure a lot of code so that every
function is responsible for marking the buffer range dirty itself
instead of assuming it's already been marked.

--
greg

In response to

Re: a question about relkind of RelationData handed over to heap_update function at 2009-10-26 17:22:28 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Christophe Pettus	2009-10-27 01:35:13	Re: Proposal: String key space for advisory locks
Previous Message	Greg Smith	2009-10-27 01:22:37	Re: Parsing config files in a directory