Обсуждение: Modifying SQL parser with extensions?

Поиск
Список
Период
Сортировка

Modifying SQL parser with extensions?

От
Matthias Lüdtke
Дата:
Hi everyone,

I am searching for the easiest way to let PostgreSQL parse a special
dialect of SQL. The dialect I want to parse augments common SQL in a way
that enables expressions within a WHERE clause to be annotated and is
thus not compatible with the standard SQL syntax anymore.

I am totally new to PostgreSQL extension development and skimming
through the Server Programming chapter I didn't notice an obvious way to
get down to the parser internals.

Is there any way of using existing extension techniques or do I have to
compile my own special version of PostgreSQL in order to modify the parser?

Any suggestions are appreciated.

Regards,
Matthias

Re: Modifying SQL parser with extensions?

От
Alvaro Herrera
Дата:
Matthias Lüdtke wrote:
> Hi everyone,
>
> I am searching for the easiest way to let PostgreSQL parse a special
> dialect of SQL. The dialect I want to parse augments common SQL in a way
> that enables expressions within a WHERE clause to be annotated and is
> thus not compatible with the standard SQL syntax anymore.

No, there's no mechanism for that.  You'd have to preprocess the query
before passing it to PostgreSQL.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Modifying SQL parser with extensions?

От
Richard Troy
Дата:

On Sun, 29 Oct 2006, Alvaro Herrera wrote:
>
> Matthias Lüdtke wrote:
> > Hi everyone,
> >
> > I am searching for the easiest way to let PostgreSQL parse a special
> > dialect of SQL. The dialect I want to parse augments common SQL in a way
> > that enables expressions within a WHERE clause to be annotated and is
> > thus not compatible with the standard SQL syntax anymore.
>
> No, there's no mechanism for that.  You'd have to preprocess the query
> before passing it to PostgreSQL.
>

Right, no mechanism within PostgreSql, however there are packages that can
do it for you. I don't know of any in the OpenSource world, but my
company, as but one example, has software that lets you run any dialect of
SQL against nearly any RDBMS - unique and embeded features excluded
(they're just passed through). Our package lets you pass individual
statements or entire files full of SQL with embeded comments as it's not
that uncommon among SQL dialects - they're just stripped out before
getting to the engine, as Alvaro suggested.

Regards,
Richard

--
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
rtroy@ScienceTools.com, http://ScienceTools.com/


Re: Modifying SQL parser with extensions?

От
Matthias Lüdtke
Дата:
Alvaro Herrera wrote:
>> I am searching for the easiest way to let PostgreSQL parse a special
>> dialect of SQL. The dialect I want to parse augments common SQL in a way
>> that enables expressions within a WHERE clause to be annotated and is
>> thus not compatible with the standard SQL syntax anymore.
>
> No, there's no mechanism for that.  You'd have to preprocess the query
> before passing it to PostgreSQL.

Actually I didn't describe my whole story:

In fact, parsing this SQL dialect would just be the first step, as the
annotations within the query induce an ordering of the result set.

So I need this extra information in the query to accomplish the
subsequent task of sorting the result set in a certain way before the
result is returned to the client. I'll have to use some hand crafted
internal data structures to do this sorting.

It seems to me that at least part of the code that needs to be written
might be implemented with the existing extension mechanism, though the
parser does not belong to that category.

Regards,
Matthias

Re: Modifying SQL parser with extensions?

От
Matthias Lüdtke
Дата:
Richard Troy wrote:
> Our package lets you pass individual statements or entire files full
> of SQL with embeded comments as it's not that uncommon among SQL
> dialects - they're just stripped out before getting to the engine, as
> Alvaro suggested.

Unfortunately I need the information in the statement to sort the result
set in a certain way later on. Sorry, I should have mentioned that from
the beginning.

This whole thing I want to implement was already written as a proxy JDBC
driver - from parser to result sorter - and works quite fine. I am now
investigating if it's possible to implement it directly in an RDBMS,
i.e. PostgreSQL.

Regards,
Matthias

Re: Modifying SQL parser with extensions?

От
Richard Troy
Дата:
Matthias,

what you want might be a type of collation sequence. You should be able to
code that within Postgres, I'd think, but I wouldn't think that passing
arguments as comments is appropriate.

RT

On Sun, 29 Oct 2006, Matthias Lüdtke wrote:

> Date: Sun, 29 Oct 2006 18:26:30 +0100
> From: Matthias Lüdtke <matthias-luedtke@gmx.de>
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Modifying SQL parser with extensions?
>
>
> Richard Troy wrote:
> > Our package lets you pass individual statements or entire files full
> > of SQL with embeded comments as it's not that uncommon among SQL
> > dialects - they're just stripped out before getting to the engine, as
> > Alvaro suggested.
>
> Unfortunately I need the information in the statement to sort the result
> set in a certain way later on. Sorry, I should have mentioned that from
> the beginning.
>
> This whole thing I want to implement was already written as a proxy JDBC
> driver - from parser to result sorter - and works quite fine. I am now
> investigating if it's possible to implement it directly in an RDBMS,
> i.e. PostgreSQL.
>
> Regards,
> Matthias
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
rtroy@ScienceTools.com, http://ScienceTools.com/


Re: Modifying SQL parser with extensions?

От
Alvaro Herrera
Дата:
Matthias Lüdtke wrote:
> Alvaro Herrera wrote:
> >>I am searching for the easiest way to let PostgreSQL parse a special
> >>dialect of SQL. The dialect I want to parse augments common SQL in a way
> >>that enables expressions within a WHERE clause to be annotated and is
> >>thus not compatible with the standard SQL syntax anymore.
> >
> >No, there's no mechanism for that.  You'd have to preprocess the query
> >before passing it to PostgreSQL.
>
> Actually I didn't describe my whole story:
>
> In fact, parsing this SQL dialect would just be the first step, as the
> annotations within the query induce an ordering of the result set.

Huh, what is this supposed to be able to do that you can't do with the
already existing ORDER BY clause?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Modifying SQL parser with extensions?

От
Tom Lane
Дата:
=?ISO-8859-1?Q?Matthias_L=FCdtke?= <matthias-luedtke@gmx.de> writes:
> In fact, parsing this SQL dialect would just be the first step, as the
> annotations within the query induce an ordering of the result set.
> So I need this extra information in the query to accomplish the
> subsequent task of sorting the result set in a certain way before the
> result is returned to the client. I'll have to use some hand crafted
> internal data structures to do this sorting.

Seems like you could save a large amount of work if you can express what
you want to do as ORDER BY a user-defined operator.

If you insist on bolting it on as new SQL syntax, changing the parser
will be only the tip of the iceberg --- you'll likely need planner and
executor changes as well.  You could get a rough idea of what's involved
in adding a new kind of query clause by looking at the last patch that
did so:
http://archives.postgresql.org/pgsql-committers/2006-08/msg00251.php

            regards, tom lane

Re: Modifying SQL parser with extensions?

От
Matthias Luedtke
Дата:
Alvaro Herrera wrote:
>> In fact, parsing this SQL dialect would just be the first step, as the
>> annotations within the query induce an ordering of the result set.
>
> Huh, what is this supposed to be able to do that you can't do with the
> already existing ORDER BY clause?

Basically, conditional statements are annotated with integers that
represent weights, like

(...)WHERE (foo = 'a')[42] OR (bar = 'b')[20]

In the result set those entries that fulfill both conditions yield score
62, i.e. 42+20, and are ranked top, whereas entries that fulfill only
one of the conditions yield scores 42 and 20 respectively and are
therefore ranked lower.

Honestly, I'm only the poor student who has to implement what the
smarter ones have thought out. ;)

And, yes, it is possible to accomplish the desired sorting with the
ORDER BY clause, but as conditions become more complex, the self cooked
dialect is considered to be more readable and intuitive to the end user.

Regards,
Matthias

Re: Modifying SQL parser with extensions?

От
Matthias Luedtke
Дата:
Tom,

Tom Lane wrote:
>> In fact, parsing this SQL dialect would just be the first step, as the
>> annotations within the query induce an ordering of the result set.
>> So I need this extra information in the query to accomplish the
>> subsequent task of sorting the result set in a certain way before the
>> result is returned to the client. I'll have to use some hand crafted
>> internal data structures to do this sorting.
>
> Seems like you could save a large amount of work if you can express what
> you want to do as ORDER BY a user-defined operator.

thanks for your reply.

I'll take a look at them. Unfortunately, I'm only the poor chap that has
to implement a proof of concept for the solution that others worked out.

> If you insist on bolting it on as new SQL syntax, changing the parser
> will be only the tip of the iceberg --- you'll likely need planner and
> executor changes as well.  You could get a rough idea of what's involved
> in adding a new kind of query clause by looking at the last patch that
> did so:

So, part of the fun will be digging in PostgreSQL and searching for the
easiest way to implement our solution.

 > http://archives.postgresql.org/pgsql-committers/2006-08/msg00251.php

Seems like I should prepare for a long journey. So, while we're at it:

Where can I find the Definitive Design Guide with valuable in-depth
information for pgsql hackers? Is there any information apart from the
official manual

http://www.postgresql.org/docs/8.0/interactive/server-programming.html

and the source itself that you can recommend? I'm especially interested
in how the typical implementation scheme for RDBMS found in textbooks
maps to pgsql. You see, I'm looking for the K&R for pgsql.

Regards,
Matthias

Re: Modifying SQL parser with extensions?

От
"Dawid Kuroczko"
Дата:
On 10/29/06, Matthias Luedtke <matthias-luedtke@gmx.de> wrote:
>
> Alvaro Herrera wrote:
> >> In fact, parsing this SQL dialect would just be the first step, as the
> >> annotations within the query induce an ordering of the result set.
> >
> > Huh, what is this supposed to be able to do that you can't do with the
> > already existing ORDER BY clause?
>
> Basically, conditional statements are annotated with integers that
> represent weights, like
>
> (...)WHERE (foo = 'a')[42] OR (bar = 'b')[20]
>
> In the result set those entries that fulfill both conditions yield score
> 62, i.e. 42+20, and are ranked top, whereas entries that fulfill only
> one of the conditions yield scores 42 and 20 respectively and are
> therefore ranked lower.

So, basically you're giving sets of three parameters:
column value, your value, score for that column
and your query should return score for the sum of all
those values.  I'll assume you only use '=' -- if you use
other conditions, feel free to modify!

First, your example data:

qnex=# CREATE TABLE blah (foo text, bar text);
qnex=# INSERT INTO blah VALUES ('a','a');
qnex=# INSERT INTO blah VALUES ('a','b');
qnex=# INSERT INTO blah VALUES ('b','b');
qnex=# INSERT INTO blah VALUES ('c','c');

Second, a user defined scorecounter:

CREATE OR REPLACE FUNCTION scorecounter(colval text[], yourval text[],
score int[]) RETURNS int AS $$
  DECLARE
    i int DEFAULT 1;
    retscore int DEFAULT 0;
  BEGIN
    WHILE score[i] IS NOT NULL
    LOOP
      IF colval[i] = yourval[i] THEN
        retscore := retscore + score[i];
      END IF;
      i := i+1;
    END LOOP;
    RETURN retscore;
  END $$ LANGUAGE PLpgSQL;

I used PL/pgSQL but you may prefer to user perl instead -- the idea
stays the same.  And now for the grand finalle:

SELECT *, scorecounter(ARRAY[foo,bar], ARRAY['a','b'], ARRAY[42,20]) from blah;
 foo | bar | scorecounter
-----+-----+--------------
 a   | a   |           42
 a   | b   |           62
 b   | b   |           20
 c   | c   |            0

SELECT * FROM blah ORDER BY scorecounter(ARRAY[foo,bar],
ARRAY['a','b'], ARRAY[42,20]) DESC;
 foo | bar
-----+-----
 a   | b
 a   | a
 b   | b
 c   | c


Note that you should add some error checking into the function,
and if you prefer, you may user other syntax for arrays, I used
ARRAY[...] because it felt self explanatory.

Regards,
     Dawid

Re: Modifying SQL parser with extensions?

От
Alvaro Herrera
Дата:
Matthias Luedtke wrote:
>
> Alvaro Herrera wrote:
> >>In fact, parsing this SQL dialect would just be the first step, as the
> >>annotations within the query induce an ordering of the result set.
> >
> >Huh, what is this supposed to be able to do that you can't do with the
> >already existing ORDER BY clause?
>
> Basically, conditional statements are annotated with integers that
> represent weights, like
>
> (...)WHERE (foo = 'a')[42] OR (bar = 'b')[20]

Hummm, doesn't this sound suspiciously close to the "skyline" algorithm
type stuff?  AFAIU skyline allowed you to specify conditions and weights
for each one, and redefined ORDER BY to work following those.  In that
case, I may as well point out that there's a couple of gals working on
this.  They hang out on the pgsql-es-ayuda list (in spanish).  They have
the grammar part working, at least.

I think this particular example could be made work by using CASE
constructs.  I'm not sure how efficient or cumbersome that would turn
out to be.  (That particular notation has the drawback that the [n]
would be interpreted as an array index though.)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Modifying SQL parser with extensions?

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Matthias Luedtke wrote:
>> Basically, conditional statements are annotated with integers that
>> represent weights, like
>>
>> (...)WHERE (foo = 'a')[42] OR (bar = 'b')[20]

> (That particular notation has the drawback that the [n]
> would be interpreted as an array index though.)

Indeed --- unless you are prepared to rip out array support, you had
better choose some other notation.

            regards, tom lane