Re: 64-bit API for large object

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Nozomi Anzai <anzai(at)sraoss(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 64-bit API for large object
Date: 2012-10-05 15:52:54
Message-ID: CADyhKSW9OmO92+=k8G5yCqN4k=ebkwvRuLezm-sT6C-bG72xcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Anzai-san,

The latest patch is fair enough for me, so let me hand over its reviewing
for comitters.

Thanks,

2012/10/1 Nozomi Anzai <anzai(at)sraoss(dot)co(dot)jp>:
> Here is 64-bit API for large object version 3 patch.
>
>> I checked this patch. It looks good, but here are still some points to be
>> discussed.
>>
>> * I have a question. What is the meaning of INT64_IS_BUSTED?
>> It seems to me a marker to indicate a platform without 64bit support.
>> However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce
>> says as follows:
>> | Remove all the special-case code for INT64_IS_BUSTED, per decision that
>> | we're not going to support that anymore.
>
> Removed INT64_IS_BUSTED.
>
>
>> * At inv_seek(), it seems to me it checks offset correctness with wrong way,
>> as follows:
>> | case SEEK_SET:
>> | if (offset < 0)
>> | elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset);
>> | obj_desc->offset = offset;
>> | break;
>> It is a right assumption, if large object size would be restricted to 2GB.
>> But the largest positive int64 is larger than expected limitation.
>> So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE)
>> instead.
>
> Fixed.
>
>
>> * At inv_write(), it definitely needs a check to prevent data-write upper 4TB.
>> In case when obj_desc->offset is a bit below 4TB, an additional 1GB write
>> will break head of the large object because of "pageno" overflow.
>
> Added a such check.
>
>
>> * Please also add checks on inv_read() to prevent LargeObjectDesc->offset
>> unexpectedly overflows 4TB boundary.
>
> Added a such check.
>
>
>> * At inv_truncate(), variable "off" is re-defined to int64. Is it really needed
>> change? All its usage is to store the result of "len % LOBLKSIZE".
>
> Fixed and back to int32.
>
>
>> Thanks,
>>
>> 2012/9/24 Nozomi Anzai <anzai(at)sraoss(dot)co(dot)jp>:
>> > Here is 64-bit API for large object version 2 patch.
>> >
>> >> I checked this patch. It can be applied onto the latest master branch
>> >> without any problems. My comments are below.
>> >>
>> >> 2012/9/11 Tatsuo Ishii <ishii(at)postgresql(dot)org>:
>> >> > Ok, here is the patch to implement 64-bit API for large object, to
>> >> > allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
>> >> > 32KB). The patch is based on Jeremy Drake's patch posted on September
>> >> > 23, 2005
>> >> > (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
>> >> > and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
>> >> > for the backend part and Yugo Nagata for the rest(including
>> >> > documentation patch).
>> >> >
>> >> > Here are changes made in the patch:
>> >> >
>> >> > 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)
>> >> >
>> >> > lo_initialize() gathers backend 64-bit large object handling
>> >> > function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.
>> >> >
>> >> > If client calls lo_*64 functions and backend does not support them,
>> >> > lo_*64 functions return error to caller. There might be an argument
>> >> > since calls to lo_*64 functions can automatically be redirected to
>> >> > 32-bit older API. I don't know this is worth the trouble though.
>> >> >
>> >> I think it should definitely return an error code when user tries to
>> >> use lo_*64 functions towards the backend v9.2 or older, because
>> >> fallback to 32bit API can raise unexpected errors if application
>> >> intends to seek the area over than 2GB.
>> >>
>> >> > Currently lo_initialize() throws an error if one of oids are not
>> >> > available. I doubt we do the same way for 64-bit functions since this
>> >> > will make 9.3 libpq unable to access large objects stored in pre-9.2
>> >> > PostgreSQL servers.
>> >> >
>> >> It seems to me the situation to split the case of pre-9.2 and post-9.3
>> >> using a condition of "conn->sversion >= 90300".
>> >
>> > Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc.
>> >
>> >
>> >> > To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
>> >> > is a pointer to 64-bit integer and actual data is placed somewhere
>> >> > else. There might be other way: add new member to union u to store
>> >> > 64-bit integer:
>> >> >
>> >> > typedef struct
>> >> > {
>> >> > int len;
>> >> > int isint;
>> >> > union
>> >> > {
>> >> > int *ptr; /* can't use void (dec compiler barfs) */
>> >> > int integer;
>> >> > int64 bigint; /* 64-bit integer */
>> >> > } u;
>> >> > } PQArgBlock;
>> >> >
>> >> > I'm a little bit worried about this way because PQArgBlock is a public
>> >> > interface.
>> >> >
>> >> I'm inclined to add a new field for the union; that seems to me straight
>> >> forward approach.
>> >> For example, the manner in lo_seek64() seems to me confusable.
>> >> It set 1 on "isint" field even though pointer is delivered actually.
>> >>
>> >> + argv[1].isint = 1;
>> >> + argv[1].len = 8;
>> >> + argv[1].u.ptr = (int *) &len;
>> >
>> > Your proposal was not adopted per discussion.
>> >
>> >
>> >> > Also we add new type "pg_int64":
>> >> >
>> >> > #ifndef NO_PG_INT64
>> >> > #define HAVE_PG_INT64 1
>> >> > typedef long long int pg_int64;
>> >> > #endif
>> >> >
>> >> > in postgres_ext.h per suggestion from Tom Lane:
>> >> > http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php
>> >> >
>> >> I'm uncertain about context of this discussion.
>> >>
>> >> Does it make matter if we include <stdint.h> and use int64_t instead
>> >> of the self defined data type?
>> >
>> > Your proposal was not adopted per discussion.
>> > Per discussion, endiannness translation was moved to fe-lobj.c.
>> >
>> >
>> >> > 2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)
>> >> >
>> >> > Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
>> >> > 64-bit seek position and data length. loread64 and lowrite64 are not
>> >> > added because if a program tries to read/write more than 2GB at once,
>> >> > it would be a sign that the program need to be re-designed anyway.
>> >> >
>> >> I think it is a reasonable.
>> >>
>> >> > 3) Backend inv_api.c functions(Nozomi Anzai)
>> >> >
>> >> > No need to add new functions. Just extend them to handle 64-bit data.
>> >> >
>> >> > BTW , what will happen if older 32-bit libpq accesses large objects
>> >> > over 2GB?
>> >> >
>> >> > lo_read and lo_write: they can read or write lobjs using 32-bit API as
>> >> > long as requested read/write data length is smaller than 2GB. So I
>> >> > think we can safely allow them to access over 2GB lobjs.
>> >> >
>> >> > lo_lseek: again as long as requested offset is smaller than 2GB, there
>> >> > would be no problem.
>> >> >
>> >> > lo_tell:if current seek position is beyond 2GB, returns an error.
>> >> >
>> >> Even though iteration of lo_lseek() may move the offset to 4TB, it also
>> >> makes unavailable to use lo_tell() to obtain the current offset, so I think
>> >> it is reasonable behavior.
>> >>
>> >> However, error code is not an appropriate one.
>> >>
>> >> + if (INT_MAX < offset)
>> >> + {
>> >> + ereport(ERROR,
>> >> + (errcode(ERRCODE_UNDEFINED_OBJECT),
>> >> + errmsg("invalid large-object
>> >> descriptor: %d", fd)));
>> >> + PG_RETURN_INT32(-1);
>> >> + }
>> >>
>> >> According to the manpage of lseek(2)
>> >> EOVERFLOW
>> >> The resulting file offset cannot be represented in an off_t.
>> >>
>> >> Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.
>> >
>> > Changed the error code and error message. We added a new error code
>> > "ERRCODE_UNDEFINED_OBJECT (22P07)".
>> >
>> >
>> >> > 4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)
>> >> >
>> >> > Comments and suggestions are welcome.
>> >> >
>> >> miscellaneous comments are below.
>> >>
>> >> Regression test is helpful. Even though no need to try to create 4TB large
>> >> object, it is helpful to write some chunks around the design boundary.
>> >> Could you add some test cases that writes some chunks around 4TB offset.
>> >
>> > Added 64-bit lobj test items into regression test and confirmed it worked
>> > rightly.
>> >
>> >
>> >> Thanks,
>> >> --
>> >> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
>> >>
>> >>
>> >> --
>> >> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> >> To make changes to your subscription:
>> >> http://www.postgresql.org/mailpref/pgsql-hackers
>> >
>> >
>> > --
>> > Nozomi Anzai
>> > SRA OSS, Inc. Japan
>> >
>> >
>> > --
>> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-hackers
>> >
>>
>>
>>
>> --
>> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
> --
> Nozomi Anzai
> SRA OSS, Inc. Japan
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2012-10-05 15:54:46 Re: Deparsing DDL command strings
Previous Message Andres Freund 2012-10-05 15:28:27 Re: Deparsing DDL command strings