Обсуждение: Modifying SQL parser with extensions?
Hi everyone, I am searching for the easiest way to let PostgreSQL parse a special dialect of SQL. The dialect I want to parse augments common SQL in a way that enables expressions within a WHERE clause to be annotated and is thus not compatible with the standard SQL syntax anymore. I am totally new to PostgreSQL extension development and skimming through the Server Programming chapter I didn't notice an obvious way to get down to the parser internals. Is there any way of using existing extension techniques or do I have to compile my own special version of PostgreSQL in order to modify the parser? Any suggestions are appreciated. Regards, Matthias
Matthias Lüdtke wrote: > Hi everyone, > > I am searching for the easiest way to let PostgreSQL parse a special > dialect of SQL. The dialect I want to parse augments common SQL in a way > that enables expressions within a WHERE clause to be annotated and is > thus not compatible with the standard SQL syntax anymore. No, there's no mechanism for that. You'd have to preprocess the query before passing it to PostgreSQL. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Sun, 29 Oct 2006, Alvaro Herrera wrote: > > Matthias Lüdtke wrote: > > Hi everyone, > > > > I am searching for the easiest way to let PostgreSQL parse a special > > dialect of SQL. The dialect I want to parse augments common SQL in a way > > that enables expressions within a WHERE clause to be annotated and is > > thus not compatible with the standard SQL syntax anymore. > > No, there's no mechanism for that. You'd have to preprocess the query > before passing it to PostgreSQL. > Right, no mechanism within PostgreSql, however there are packages that can do it for you. I don't know of any in the OpenSource world, but my company, as but one example, has software that lets you run any dialect of SQL against nearly any RDBMS - unique and embeded features excluded (they're just passed through). Our package lets you pass individual statements or entire files full of SQL with embeded comments as it's not that uncommon among SQL dialects - they're just stripped out before getting to the engine, as Alvaro suggested. Regards, Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 rtroy@ScienceTools.com, http://ScienceTools.com/
Alvaro Herrera wrote: >> I am searching for the easiest way to let PostgreSQL parse a special >> dialect of SQL. The dialect I want to parse augments common SQL in a way >> that enables expressions within a WHERE clause to be annotated and is >> thus not compatible with the standard SQL syntax anymore. > > No, there's no mechanism for that. You'd have to preprocess the query > before passing it to PostgreSQL. Actually I didn't describe my whole story: In fact, parsing this SQL dialect would just be the first step, as the annotations within the query induce an ordering of the result set. So I need this extra information in the query to accomplish the subsequent task of sorting the result set in a certain way before the result is returned to the client. I'll have to use some hand crafted internal data structures to do this sorting. It seems to me that at least part of the code that needs to be written might be implemented with the existing extension mechanism, though the parser does not belong to that category. Regards, Matthias
Richard Troy wrote: > Our package lets you pass individual statements or entire files full > of SQL with embeded comments as it's not that uncommon among SQL > dialects - they're just stripped out before getting to the engine, as > Alvaro suggested. Unfortunately I need the information in the statement to sort the result set in a certain way later on. Sorry, I should have mentioned that from the beginning. This whole thing I want to implement was already written as a proxy JDBC driver - from parser to result sorter - and works quite fine. I am now investigating if it's possible to implement it directly in an RDBMS, i.e. PostgreSQL. Regards, Matthias
Matthias, what you want might be a type of collation sequence. You should be able to code that within Postgres, I'd think, but I wouldn't think that passing arguments as comments is appropriate. RT On Sun, 29 Oct 2006, Matthias Lüdtke wrote: > Date: Sun, 29 Oct 2006 18:26:30 +0100 > From: Matthias Lüdtke <matthias-luedtke@gmx.de> > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Modifying SQL parser with extensions? > > > Richard Troy wrote: > > Our package lets you pass individual statements or entire files full > > of SQL with embeded comments as it's not that uncommon among SQL > > dialects - they're just stripped out before getting to the engine, as > > Alvaro suggested. > > Unfortunately I need the information in the statement to sort the result > set in a certain way later on. Sorry, I should have mentioned that from > the beginning. > > This whole thing I want to implement was already written as a proxy JDBC > driver - from parser to result sorter - and works quite fine. I am now > investigating if it's possible to implement it directly in an RDBMS, > i.e. PostgreSQL. > > Regards, > Matthias > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 rtroy@ScienceTools.com, http://ScienceTools.com/
Matthias Lüdtke wrote: > Alvaro Herrera wrote: > >>I am searching for the easiest way to let PostgreSQL parse a special > >>dialect of SQL. The dialect I want to parse augments common SQL in a way > >>that enables expressions within a WHERE clause to be annotated and is > >>thus not compatible with the standard SQL syntax anymore. > > > >No, there's no mechanism for that. You'd have to preprocess the query > >before passing it to PostgreSQL. > > Actually I didn't describe my whole story: > > In fact, parsing this SQL dialect would just be the first step, as the > annotations within the query induce an ordering of the result set. Huh, what is this supposed to be able to do that you can't do with the already existing ORDER BY clause? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
=?ISO-8859-1?Q?Matthias_L=FCdtke?= <matthias-luedtke@gmx.de> writes: > In fact, parsing this SQL dialect would just be the first step, as the > annotations within the query induce an ordering of the result set. > So I need this extra information in the query to accomplish the > subsequent task of sorting the result set in a certain way before the > result is returned to the client. I'll have to use some hand crafted > internal data structures to do this sorting. Seems like you could save a large amount of work if you can express what you want to do as ORDER BY a user-defined operator. If you insist on bolting it on as new SQL syntax, changing the parser will be only the tip of the iceberg --- you'll likely need planner and executor changes as well. You could get a rough idea of what's involved in adding a new kind of query clause by looking at the last patch that did so: http://archives.postgresql.org/pgsql-committers/2006-08/msg00251.php regards, tom lane
Alvaro Herrera wrote: >> In fact, parsing this SQL dialect would just be the first step, as the >> annotations within the query induce an ordering of the result set. > > Huh, what is this supposed to be able to do that you can't do with the > already existing ORDER BY clause? Basically, conditional statements are annotated with integers that represent weights, like (...)WHERE (foo = 'a')[42] OR (bar = 'b')[20] In the result set those entries that fulfill both conditions yield score 62, i.e. 42+20, and are ranked top, whereas entries that fulfill only one of the conditions yield scores 42 and 20 respectively and are therefore ranked lower. Honestly, I'm only the poor student who has to implement what the smarter ones have thought out. ;) And, yes, it is possible to accomplish the desired sorting with the ORDER BY clause, but as conditions become more complex, the self cooked dialect is considered to be more readable and intuitive to the end user. Regards, Matthias
Tom, Tom Lane wrote: >> In fact, parsing this SQL dialect would just be the first step, as the >> annotations within the query induce an ordering of the result set. >> So I need this extra information in the query to accomplish the >> subsequent task of sorting the result set in a certain way before the >> result is returned to the client. I'll have to use some hand crafted >> internal data structures to do this sorting. > > Seems like you could save a large amount of work if you can express what > you want to do as ORDER BY a user-defined operator. thanks for your reply. I'll take a look at them. Unfortunately, I'm only the poor chap that has to implement a proof of concept for the solution that others worked out. > If you insist on bolting it on as new SQL syntax, changing the parser > will be only the tip of the iceberg --- you'll likely need planner and > executor changes as well. You could get a rough idea of what's involved > in adding a new kind of query clause by looking at the last patch that > did so: So, part of the fun will be digging in PostgreSQL and searching for the easiest way to implement our solution. > http://archives.postgresql.org/pgsql-committers/2006-08/msg00251.php Seems like I should prepare for a long journey. So, while we're at it: Where can I find the Definitive Design Guide with valuable in-depth information for pgsql hackers? Is there any information apart from the official manual http://www.postgresql.org/docs/8.0/interactive/server-programming.html and the source itself that you can recommend? I'm especially interested in how the typical implementation scheme for RDBMS found in textbooks maps to pgsql. You see, I'm looking for the K&R for pgsql. Regards, Matthias
On 10/29/06, Matthias Luedtke <matthias-luedtke@gmx.de> wrote: > > Alvaro Herrera wrote: > >> In fact, parsing this SQL dialect would just be the first step, as the > >> annotations within the query induce an ordering of the result set. > > > > Huh, what is this supposed to be able to do that you can't do with the > > already existing ORDER BY clause? > > Basically, conditional statements are annotated with integers that > represent weights, like > > (...)WHERE (foo = 'a')[42] OR (bar = 'b')[20] > > In the result set those entries that fulfill both conditions yield score > 62, i.e. 42+20, and are ranked top, whereas entries that fulfill only > one of the conditions yield scores 42 and 20 respectively and are > therefore ranked lower. So, basically you're giving sets of three parameters: column value, your value, score for that column and your query should return score for the sum of all those values. I'll assume you only use '=' -- if you use other conditions, feel free to modify! First, your example data: qnex=# CREATE TABLE blah (foo text, bar text); qnex=# INSERT INTO blah VALUES ('a','a'); qnex=# INSERT INTO blah VALUES ('a','b'); qnex=# INSERT INTO blah VALUES ('b','b'); qnex=# INSERT INTO blah VALUES ('c','c'); Second, a user defined scorecounter: CREATE OR REPLACE FUNCTION scorecounter(colval text[], yourval text[], score int[]) RETURNS int AS $$ DECLARE i int DEFAULT 1; retscore int DEFAULT 0; BEGIN WHILE score[i] IS NOT NULL LOOP IF colval[i] = yourval[i] THEN retscore := retscore + score[i]; END IF; i := i+1; END LOOP; RETURN retscore; END $$ LANGUAGE PLpgSQL; I used PL/pgSQL but you may prefer to user perl instead -- the idea stays the same. And now for the grand finalle: SELECT *, scorecounter(ARRAY[foo,bar], ARRAY['a','b'], ARRAY[42,20]) from blah; foo | bar | scorecounter -----+-----+-------------- a | a | 42 a | b | 62 b | b | 20 c | c | 0 SELECT * FROM blah ORDER BY scorecounter(ARRAY[foo,bar], ARRAY['a','b'], ARRAY[42,20]) DESC; foo | bar -----+----- a | b a | a b | b c | c Note that you should add some error checking into the function, and if you prefer, you may user other syntax for arrays, I used ARRAY[...] because it felt self explanatory. Regards, Dawid
Matthias Luedtke wrote: > > Alvaro Herrera wrote: > >>In fact, parsing this SQL dialect would just be the first step, as the > >>annotations within the query induce an ordering of the result set. > > > >Huh, what is this supposed to be able to do that you can't do with the > >already existing ORDER BY clause? > > Basically, conditional statements are annotated with integers that > represent weights, like > > (...)WHERE (foo = 'a')[42] OR (bar = 'b')[20] Hummm, doesn't this sound suspiciously close to the "skyline" algorithm type stuff? AFAIU skyline allowed you to specify conditions and weights for each one, and redefined ORDER BY to work following those. In that case, I may as well point out that there's a couple of gals working on this. They hang out on the pgsql-es-ayuda list (in spanish). They have the grammar part working, at least. I think this particular example could be made work by using CASE constructs. I'm not sure how efficient or cumbersome that would turn out to be. (That particular notation has the drawback that the [n] would be interpreted as an array index though.) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Matthias Luedtke wrote: >> Basically, conditional statements are annotated with integers that >> represent weights, like >> >> (...)WHERE (foo = 'a')[42] OR (bar = 'b')[20] > (That particular notation has the drawback that the [n] > would be interpreted as an array index though.) Indeed --- unless you are prepared to rip out array support, you had better choose some other notation. regards, tom lane