Quick Links

pg_read_file() and non-ascii input file

From:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	pg_read_file() and non-ascii input file
Date:	2009-10-28 02:27:35
Message-ID:	20091028112735.7F97.52131E4D@oss.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

pg_read_file() takes byte-offset and length as arguments,
but we don't check the result text with pg_verify_mbstr().
Should pg_read_file() return bytea instead of text or adding
some codes to verify the input? Only superusers are allowed
to use the function, but it is still dangerous.

If we leave the result in text type and add verifier, we also need to
consider how to handle multi-byte text. Offset and length should not
split one multi-byte character. We can assume the offset as a correct
boundary if we can trust users, but no one knows correct length before
the function call.

An idea is to have binary and text versions of pg_read_file:
* pg_read_binary_file(filename, offset, length) : bytea
* pg_read_text_file(filename, offset) : ROW( text, nextline_offset )
-- it returns the next line starting with 'offset'.
but such changes could bring on compatibility problems.

Comments, better ideas?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Responses

Re: pg_read_file() and non-ascii input file at 2009-11-30 09:36:05 from Itagaki Takahiro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2009-10-28 02:29:33	Re: Parsing config files in a directory
Previous Message	Tom Lane	2009-10-28 02:18:03	Re: per-tablespace random_page_cost/seq_page_cost