Re: refactoring basebackup.c

From: Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>
Subject: Re: refactoring basebackup.c
Date: 2021-10-15 11:54:15
Message-ID: CAOgcT0N1zxV0swyR3hDVVXjvun-zO1HhDx4+_F4nUiMv4RiUNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

> The loop is gone, but CHUNK_SIZE itself seems to have evaded the
executioner.

I am sorry, but I did not really get it. Or it is what you have pointed
in the following paragraphs?

> I think this is not the best way to accomplish the goal. Adding
> LZ4F_compressBound(0) to next_buf_len makes the buffer substantially
> bigger for something that's only going to happen once.

Yes, you are right. I missed this.

> We are assuming in any case, I think, that LZ4F_compressBound(0) <=
> LZ4F_compressBound(mysink->base.bbs_buffer_length), so all you need to
> do is have bbsink_end_archive() empty the buffer, if necessary, before
> calling LZ4F_compressEnd().

This is a fair enough assumption.

> With just that change, you can set
> next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
> but that's also more than you need. You can instead do next_buf_len =
> Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
> probably thinking that won't work, because bbsink_lz4_begin_archive()
> could fill up the buffer partway, and then the first call to
> bbsink_lz4_archive_contents() could overrun it. But that problem can
> be solved by reversing the order of operations in
> bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
> test whether you need to empty the buffer first, and if so, do it.

I am still not able to get - how can we survive with a mere
size of Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound).
LZ4F_HEADER_SIZE_MAX is defined as 19 in lz4 library. With this
proposal, it is almost guaranteed that the next buffer length will
be always set to 19, which will result in failure of a call to
LZ4F_compressUpdate() with the error LZ4F_ERROR_dstMaxSize_tooSmall,
even if we had called bbsink_archive_contents() before.

> That's actually less confusing than the way you've got it, because as
> you have it written, we don't really know why we're emptying the
> buffer -- is it to prepare for the next call to LZ4F_compressUpdate(),
> or is it to prepare for the call to LZ4F_compressEnd()? How do we know
> now how much space the next person writing into the buffer is going to
> need? It seems better if bbsink_lz4_archive_contents() empties the
> buffer before calling LZ4F_compressUpdate() if that call might not
> have enough space, and likewise bbsink_lz4_end_archive() empties the
> buffer before calling LZ4F_compressEnd() if that's needed. That way,
> each callback makes the space *it* needs, not the space the *next*
> caller needs. (bbsink_lz4_end_archive() still needs to ALSO empty the
> buffer after LZ4F_compressEnd(), so we don't orphan any data.)

Sure, I get your point here.

> On another note, if the call to LZ4F_freeCompressionContext() is
> required in bbsink_lz4_end_archive(), then I think this code is going
> to just leak the memory used by the compression context if an error
> occurs before this code is reached. That kind of sucks.

Yes, the LZ4F_freeCompressionContext() is needed to clear the
LZ4F_cctx. The structure LZ4F_cctx_s maintains internal stages
of compression, internal buffers, etc.

> The way to fix
> it, I suppose, is a TRY/CATCH block, but I don't think that can be
> something internal to basebackup_lz4.c: I think the bbsink stuff would
> need to provide some kind of infrastructure for basebackup_lz4.c to
> use. It would be a lot better if we could instead get LZ4 to allocate
> memory using palloc(), but a quick Google search suggests that you
> can't accomplish that without recompiling liblz4, and that's not
> workable since we don't want to require a liblz4 built specifically
> for PostgreSQL. Do you see any other solution?

You mean the way gzip allows us to use our own alloc and free functions
by means of providing the function pointers for them. Unfortunately,
no, LZ4 does not have that kind of provision. Maybe that makes a
good proposal for LZ4 library ;-).
I cannot think of another solution to it right away.

Regards,
Jeevan Ladhe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-10-15 12:01:30 Re: [PATCH] test/ssl: rework the sslfiles Makefile target
Previous Message Daniel Gustafsson 2021-10-15 11:44:12 Re: Unbounded %s in sscanf