Re: BUG #16403: set_bit function does not have expected effect

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Alex Movitz <amovitz(at)bncpu(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16403: set_bit function does not have expected effect
Date: 2020-04-30 18:19:47
Message-ID: CA+bJJbye+JYinM-CRX4RzFsbbG=J7KjH60AMiOBSnkaRK4GcxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Apr 30, 2020 at 6:58 PM Alex Movitz <amovitz(at)bncpu(dot)com> wrote:
> I see, so the get/set_bit functions act on bits within individual bytes, not as a bit stream. Some other languages will act directly on the bits, rather than iterating over bytes.

I doubt it, there is not such thing as a bitstream in normal memory (
bubble or acoustic memories will be a different things ). You have
some chip-implemetation-defined arrays which the cpu sees as arrays of
bytes.

> A good example of this is in pure C when setting a bit, it is easy to do programmatically with shifting. This is how I've performed these operations in the past, specifically when looping over bits.

The example works exactly as C. The problem is when you do it in PURE
C you do it with, typically, a long int.

The problem here is you are doing it in a BYTE ARRAY. In C, typically,
you would use something like an unsigned char array and do something
like "void set_byte(unsigned char * bytea, int bit) { bytea[bit/8] |=
(1<<(bit%8)) }"
and, if you print a byte array in hex with "for (int i=0;i<nbytes;i)
printf("%02x",(unsigned)bytea[i])" you would get that exact result.

If you want to manipulate long integers, just use an integer type,
INTEGER seems to be the right one for your case ( 32 bits ), and then
set the bit using logical ops ( update t set f = f | (1 << n) ) and
print it in hex ( select to_hex(f) ), which is the same as in C ( same
for setting, printf("%08x",f) ).

bytea is for array of bytes, works the same as any similar C package.
It has the advantage of being variable length, as a C char array. What
you are trying to do is get C-integer behaviour, use postgres integer
which are similar.

> unsigned int x = 13;
> unsigned int i = 0;
> i |= 1 << x;

You should use unsigned long or uint32_t for 32 bits, int is only
guaranteed to have 16 bits ( IIRC, short>=16, long>=32,
short<=int<=long is the only guarantee you get in C ).

> This will set the bit in the bytes relative to the right-most position. In this case, it would return an integer with a value of 32, having bytes with the hex representation 0x00002000 (when using printf, MSB).

This will work when using %08X for printf, but, as you are using
bytea, what you are doing is the equivalent of
unsigned char * bytea = (unsigned char *) (&i)
printf "0x%02x%02x%02x%02x", bytea[0], bytea[1], bytea[2], bytea[3])
which, IIRC, will give you "0x00200000" in intel.

> Now that I understand the implementation reasoning, I also understand that this will probably not change. If there are some bit functions implemented with the BYTEA type similar to the C above, however, I would definitely expect it to perform the same way.

Now I'm really convinced it is an expectation problem, not a display
convention problem. You want a long and are using a bytea(4). This has
the same problemas as using a char[4] as an int32_t in C. Want ints?
Use them. And I suspect they will also be faster.

And I suspect this may be due to you using bytea because it prints in
hex by default. Because, in your C examples, how would you do the long
and bit shifting stuff with a bytea(2345) equivalent? The unsigned
char stuff will directly translate.

I located int bit manipulation easily under "9.3. Mathematical
Functions and Operators", but not finding hex formatting in "9.8. Data
Type Formatting Functions" I had to do a quick grepping for hex in the
index to locate it under "9.4. String Functions and Operators", and
this is after having used postgres since before it got the ql tail. I
may be too used to *printf for this conversions. That's why I suspect
bytea was chosen for its hex display default.

Francisco Olarte.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-04-30 18:28:10 BUG #16406: can't find public key for PostgreSQL RPM Building Project <pgsqlrpms-hackers@pgfoundry.org>
Previous Message Jacob Crell 2020-04-30 17:15:32 Re: BUG #16405: Exception P0004 not caught in EXCEPTION WHEN OTHERS