On 2020-01-23 18:04, Robert Haas wrote:
> Now, you might say "well, why don't we just do an encoding
> conversion?", but we can't. When the filesystem tells us what the file
> names are, it does not tell us what encoding the person who created
> those files had in mind. We don't know that they had*any* encoding in
> mind. IIUC, a file in the data directory can have a name that consists
> of any sequence of bytes whatsoever, so long as it doesn't contain
> prohibited characters like a path separator or \0 byte. But only some
> of those possible octet sequences can be stored in a manifest that has
> to be valid UTF-8.
I think it wouldn't be unreasonable to require that file names in the
database directory be consistently encoded (as defined by pg_control,
probably). After all, this information is sometimes also shown in
system views, so it's already difficult to process total junk. In
practice, this shouldn't be an onerous requirement.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services