Обсуждение: Generic File Access Function to read program output

Поиск
Список
Период
Сортировка

Generic File Access Function to read program output

От
Carsten Klein
Дата:
Hi there,

on PostgreSQL 14, I'm using function pg_read_file to read a JSON file on 
the server. After that, the JSON file gets casted to jsonb and with 
function jsonb_array_elements I'm iterating over the "records", which I 
transform into a PostgreSQL ROWTYPE with jsonb_populate_record...

Since the source files are actually XML files, these are turned into 
JSON files with Node JS and the fast-xml-parser module (processing JSON 
is much faster and more comfortable than processing XML in PostgreSQL).

The command line of this conversion process is like this:

# node /opt/my_node_apps/xml_to_json.js <path_to_xml_file>

In order to do this without temporary JSON files (which need to be 
deleted at some time), it would be great to have a new Generic File 
Access Function

pg_read_program_output(command)


Although one could argue, that it's not a Generic *File* Access 
Function, that function would be a worthwhile addition and could use the 
same semantics and rules as with the

COPY table_name FROM PROGRAM 'command'

statement. Also the implementation (running a command with the shell and 
capture it's STDOUT) is nearly the same.

In contrast to the other Generic File Access Functions, it will be 
almost impossible to restrict access to programs or commands within the 
database cluster directory (could be a complex shell command). Aside 
from that this makes no sense since, typically, there are no executable 
programs in those directories.

Even worse, it's likely also not possible to restrict the source of the 
content read (the STDOUT) to be any of these directories, since the 
program could just dump anything to its STDOUT.

AFAIT, that's not really an issue but only makes this new Generic File 
Access Function special, in that these restrictions and the meaning of 
role pg_read_server_files just do not apply for it.

Do you know if there is already such a function, maybe provided by an 
extension I do not yet know?


Cheers
Carsten





Re: Generic File Access Function to read program output

От
Joe Conway
Дата:
On 2/7/24 05:28, Carsten Klein wrote:
> Hi there,
> 
> on PostgreSQL 14, I'm using function pg_read_file to read a JSON file on
> the server. After that, the JSON file gets casted to jsonb and with
> function jsonb_array_elements I'm iterating over the "records", which I
> transform into a PostgreSQL ROWTYPE with jsonb_populate_record...
> 
> Since the source files are actually XML files, these are turned into
> JSON files with Node JS and the fast-xml-parser module (processing JSON
> is much faster and more comfortable than processing XML in PostgreSQL).
> 
> The command line of this conversion process is like this:
> 
> # node /opt/my_node_apps/xml_to_json.js <path_to_xml_file>
> 
> In order to do this without temporary JSON files (which need to be
> deleted at some time), it would be great to have a new Generic File
> Access Function
> 
> pg_read_program_output(command)
> 
> 
> Although one could argue, that it's not a Generic *File* Access
> Function, that function would be a worthwhile addition and could use the
> same semantics and rules as with the
> 
> COPY table_name FROM PROGRAM 'command'
> 
> statement. Also the implementation (running a command with the shell and
> capture it's STDOUT) is nearly the same.
> 
> In contrast to the other Generic File Access Functions, it will be
> almost impossible to restrict access to programs or commands within the
> database cluster directory (could be a complex shell command). Aside
> from that this makes no sense since, typically, there are no executable
> programs in those directories.
> 
> Even worse, it's likely also not possible to restrict the source of the
> content read (the STDOUT) to be any of these directories, since the
> program could just dump anything to its STDOUT.
> 
> AFAIT, that's not really an issue but only makes this new Generic File
> Access Function special, in that these restrictions and the meaning of
> role pg_read_server_files just do not apply for it.
> 
> Do you know if there is already such a function, maybe provided by an
> extension I do not yet know?


Maybe write your own in plpython or plperlu?

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Generic File Access Function to read program output

От
Carsten Klein
Дата:
Am 07.02.2024 um 15:54 schrieb Joe Conway:
> 
> Maybe write your own in plpython or plperlu?
> 

Yeah... why didn't I think of if? PL/Python would be a first class option.

Nevertheless, I still believe such a function in PostgreSQL's core would 
be a good addition. Maybe someone feels like implementing one some day... :)

Carsten