Re: Schema variables - new implementation for Postgres 15

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: Schema variables - new implementation for Postgres 15
Дата
Msg-id CAFj8pRB=ON4n1xbmXxW36bowW2r904txn+1N1wC_oSmNYLqKKw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Schema variables - new implementation for Postgres 15  (Dmitry Dolgov <9erthalion6@gmail.com>)
Ответы Re: Schema variables - new implementation for Postgres 15
Список pgsql-hackers
Hi

pá 24. 5. 2024 v 13:32 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Wed, May 22, 2024 at 08:44:28PM +0200, Pavel Stehule wrote:
> st 22. 5. 2024 v 19:25 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>
> > Peter Eisentraut <peter@eisentraut.org> writes:
> > > On 18.05.24 13:29, Alvaro Herrera wrote:
> > >> I want to note that when we discussed this patch series at the dev
> > >> meeting in FOSDEM, a sort-of conclusion was reached that we didn't want
> > >> schema variables at all because of the fact that creating a variable
> > >> would potentially change the meaning of queries by shadowing table
> > >> columns.  But this turns out to be incorrect: it's_variables_  that are
> > >> shadowed by table columns, not the other way around.
> >
> > > But that's still bad, because seemingly unrelated schema changes can
> > > make variables appear and disappear.  For example, if you have
> > >       SELECT a, b FROM table1
> > > and then you drop column b, maybe the above query continues to work
> > > because there is also a variable b.
> >
> > Yeah, that seems pretty dangerous.  Could we make it safe enough
> > by requiring some qualification on variable names?  That is, if
> > you mean b to be a variable, then you must write something like
> >
> >         SELECT a, pg_variables.b FROM table1
> >
> > This is still ambiguous if you use "pg_variables" as a table alias in
> > the query, but the alias would win so the query still means what it
> > meant before.  Also, table aliases (as opposed to actual table names)
> > don't change readily, so I don't think there's much risk of the query
> > suddenly meaning something different than it did yesterday.
> >
>
> With active shadowing variable warning for described example you will get a
> warning before dropping.

I assume you're talking about a warning, which one will get querying the
table with shadowed columns. If no such query has happened yet and the
column was dropped, there will be no warning.

sure - the possible identifier collision cannot be solved in SQL perfectly. It is the same with tables.
When I add badly named column to table, I'll get an error "ambiguous  columns" just when I'll execute
query. The system catalog just cannot protect against collisions - it is true for columns, variables, tables.
Little bit protected are views, that are stored in parsed format, but any other object can be broken when
somebody choose bad names in catalog or queries. There is not any protection.
 

Aside that, I'm afraid dropping a warning in log does not have
sufficient visibility to warn about the issue, since one needs to read
those logs first. I guess what folks are looking for is more constraints
out of the box, preventing any ambiguity.

We can increase (optionality) the level of this message to error. It is not perfect, but it can work well.

I think so there is not higher risk with variables than current risk with just tables.

a) the possibility to create variables is limited by rights on schema. So nobody can create variables without necessary rights (invisibly)

b) if user has own schema with CREATE right, then it can create variables just for self, and with default setting, just visible for self, 
and just accessible for self. When other users try to use these variables, then the query fails due to missing access rights (usually).
Common user cannot to create variables in application schema and cannot to set search_path for applications.

c) the changes of schema are usually tested on some testing stages before are applied on production. So when there
will be possible collision or some other defect, probably it will be detected there. Untested changes of catalog on production is not too common
today.

d) any risk that can be related for variables, is related just to renaming column or table.

 

> Session variables are joined with schema (in my proposal). Do anybody can
> do just
>
> CREATE SCHEMA svars; -- or what (s)he likes
> CREATE VARIABLE svars.b AS int;
>
> SELECT a, b FROM table1
>
> and if somebody can be really safe, the can write
>
> SELECT t.a, t.b FROM table1 t
>
> or
>
> SELECT t.a, svars.b FROM table1 t
>
> It can be customized in the way anybody prefers - just creating dedicated
> schemas and setting search_path. Using its own schema for session variables
> without enhancing search_path for this schema forces the necessity to set
> only qualified names for session variables.
>
> Sure the naming of schemas, aliases can be unhappy wrong, and there can be
> the problem. But this can be a problem today too.

If I understand you correctly, you're saying that there are "best
practices" how to deal with session variables to avoid any potential
issues. But I think it's more user-friendly to have something that will
not allow shooting yourself in the foot right out of the box. You're
right, similar things could probably happen with the already existing
functionality, but it doesn't give us rights to add more to it.
Especially if it's going to be about a brand-new feature.

Unfortunately, there is not any possibility - just in SQL (without introduction of variables).

Example - Tom's proposal using dedicated schema

ok - I can limit the possibility to create variables just for schema "pg_var"

CREATE VARIABLE pg_var.a AS int;

but if somebody will write query like

SELECT pg_var.a FROM tab pg_var

then we are back on start.

 

As far as I can see now, it's a major design flaw that could keep the
patch from being accepted. Fortunately there are few good proposals how
to address this, folks are genuinely trying to help. What do you think
about trying some of them out, as an alternative approach, to compare
functionality and user experience?

It is a design flaw of SQL. The issue we talk about is the generic property of SQL, and then you cannot fix it.

I thought about possibility to introduce dedicated function

svalue(regvariable) returns any - with planner support

and possibility to force usage of this function. Another possibility is using some simple dedicated operator (syntax) for force using of variables
so theoretically this can looks like:

set strict_usage_of_session_variables to on;
SELECT * FROM tab WHERE a = svalue('myvar.var');
or

SELECT * FROM tab WHERE a = @ myvar.var;

This can be really safe. Personally It is not my cup of tea, but I can live it (and this mode can be default).

Theoretically we can limit usage of variables just for PL/pgSQL. It can reduce risks too, but it breaks usage variables for parametrization of DO blocks (what is my primary motivation), but it can be good enough to support migration from PL/SQL.
 

In the meantime I'm afraid I have to withdraw "Ready for committer"
status, sorry. I've clearly underestimated the importance of variables
shadowing, thanks Alvaro and Peter for pointing out some dangerous
cases. I still believe though that the majority of the patch is in a
good shape and the question about variables shadowing is the only thing
that keeps it from moving forward.

I understand.

I'll try to recapitulate my objections against proposed designs

a) using syntax like MS - DECLARE command and '@@' prefix - it is dynamic, so there is not possibility of static check. It is not joined with schema, so there are possible collisions between variables and and the end the variables are named like @@mypackage_myvar - so some custom naming convention is necessary too. There is not possibility to set access rights.

b) using variables like MySQL - first usage define it, and access by '@' prefix. It is simple, but without possibility of static check. There is not possibility to set access rights.

c) using variables with necessity to define it in FROM clause. It is safe, but it can be less readable, when you use more variables, and it is not too readable, and user friendly, because you need to write FROM. And can be messy, because you usually will use variables in queries, and it is introduce not relations into FROM clause. But I can imagine this mode as alternative syntax, but it is very unfriendly and not intuitive (I think). More probably it doesn't fast execution in simple expression execution mode.

d) my proposal - there is possibility of collisions, but consistent with naming of database objects, allows set of access rights, allows static analyze, consistent with PL/pgSQL and similar to PL/pgSQL.

There is not any other possibility. Any time this is war between be user friendly, be readable, be correctly - but there is not perfect solution, because just SQL is not perfect. Almost all mentioned objections against proposed variables are valid just for tables and columns.

Regards

Pavel
 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yao Wang
Дата:
Сообщение: Re: 回复: An implementation of multi-key sort
Следующее
От: Richard Guo
Дата:
Сообщение: Re: Wrong results with grouping sets