Обсуждение: syntax for reaching into records, specifically ts_stat results
Dear Postgres Folk, In working with tsvectors (added in 8.3), I've come to a place where my syntax-fu has failed me. I've resorted to turning a result set of records into strings so that I can regexp out the record fields I need, rather than access them directly, as I'm sure it's possible to do with the right syntactic formulation. Although my solution works, I'm sure it's much less efficient than it could be, and hope someone on the list can help do this the right way. Basically, I would like to transpose a series of tsvectors (one per row) into columns. E.g., as tsvects, I have this: test=# select * from tsvects; sentence_id | tsvect -------------+------------------------------ 1 | 'fox':3 'brown':2 'quick':1 2 | 'lazi':1 'eleph':3 'green':2 Instead I want this: sentence_id | word | freq -------------+-------+------ 1 | fox | 1 1 | brown | 1 1 | quick | 1 2 | lazi | 1 2 | eleph | 1 2 | green | 1 I am able to generate this with the following view, but the problem is that to create it, I must first cast the ts_stat results to a string, and then regexp out the pertinent pieces: create or replace view words as select sentence_id, substring(stat from '^\\(([^,]+),') as word, substring(stat from ',([^,]+)\\)$') as freq from (select sentence_id, ts_stat('select tsvect from tsvects where sentence_id = ' || tsvects.sentence_id)::text as stat from tsvects ) as foo; It seems like there should be a way to access fields in the records returned from ts_stat directly, but I can't figure out how. Here's the result of the subquery: test=# select sentence_id, ts_stat('select tsvect from tsvects where sentence_id = ' || tsvects.sentence_id)::text as stat from tsvects; sentence_id | stat -------------+------------- 1 | (fox,1,1) 1 | (brown,1,1) 1 | (quick,1,1) 2 | (lazi,1,1) 2 | (eleph,1,1) 2 | (green,1,1) (6 rows) If I try to get at the elements (which I believe are named 'word', 'ndoc', 'nentry'), I get a variety of syntax errors: test=# select sentence_id, test-# stat['word'], test-# stat['nentry'] test-# from (select sentence_id, test(# ts_stat('select tsvect from tsvects where sentence_id = ' || test(# tsvects.sentence_id) as stat test(# from tsvects test(# ) as foo; ERROR: cannot subscript type record because it is not an array If I say stat.word (instead of subscripting), I get 'missing FROM- clause entry for table "stat"'. If I say foo.stat.word, I get 'ERROR: schema "foo" does not exist'. Any ideas on how to get into these records with resorting to text parsing? Thanks, Dan
try select * from ts_stat(....) btw, performance of ts_stat() was greatly improved in 8.4. Oleg On Tue, 9 Dec 2008, Dan Chak wrote: > Dear Postgres Folk, > > In working with tsvectors (added in 8.3), I've come to a place where my > syntax-fu has failed me. I've resorted to turning a result set of records > into strings so that I can regexp out the record fields I need, rather than > access them directly, as I'm sure it's possible to do with the right > syntactic formulation. Although my solution works, I'm sure it's much less > efficient than it could be, and hope someone on the list can help do this the > right way. > > Basically, I would like to transpose a series of tsvectors (one per row) into > columns. E.g., as tsvects, I have this: > > test=# select * from tsvects; > sentence_id | tsvect > -------------+------------------------------ > 1 | 'fox':3 'brown':2 'quick':1 > 2 | 'lazi':1 'eleph':3 'green':2 > > Instead I want this: > > sentence_id | word | freq > -------------+-------+------ > 1 | fox | 1 > 1 | brown | 1 > 1 | quick | 1 > 2 | lazi | 1 > 2 | eleph | 1 > 2 | green | 1 > > I am able to generate this with the following view, but the problem is that > to create it, I must first cast the ts_stat results to a string, and then > regexp out the pertinent pieces: > > create or replace view words as > select sentence_id, > substring(stat from '^\\(([^,]+),') as word, > substring(stat from ',([^,]+)\\)$') as freq > from (select sentence_id, > ts_stat('select tsvect from tsvects where sentence_id = ' || > tsvects.sentence_id)::text as stat > from tsvects > ) as foo; > > It seems like there should be a way to access fields in the records returned > from ts_stat directly, but I can't figure out how. Here's the result of the > subquery: > > test=# select sentence_id, > ts_stat('select tsvect from tsvects where sentence_id = ' || > tsvects.sentence_id)::text as stat > from tsvects; > sentence_id | stat > -------------+------------- > 1 | (fox,1,1) > 1 | (brown,1,1) > 1 | (quick,1,1) > 2 | (lazi,1,1) > 2 | (eleph,1,1) > 2 | (green,1,1) > (6 rows) > > If I try to get at the elements (which I believe are named 'word', 'ndoc', > 'nentry'), I get a variety of syntax errors: > > test=# select sentence_id, > test-# stat['word'], > test-# stat['nentry'] > test-# from (select sentence_id, > test(# ts_stat('select tsvect from tsvects where sentence_id = > ' || > test(# tsvects.sentence_id) as stat > test(# from tsvects > test(# ) as foo; > ERROR: cannot subscript type record because it is not an array > > If I say stat.word (instead of subscripting), I get 'missing FROM-clause > entry for table "stat"'. If I say foo.stat.word, I get 'ERROR: schema "foo" > does not exist'. > > Any ideas on how to get into these records with resorting to text parsing? > > Thanks, > Dan > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Dan Chak <chak@MIT.EDU> writes: > If I say stat.word (instead of subscripting), I get 'missing FROM- > clause entry for table "stat"'. If I say foo.stat.word, I get > 'ERROR: schema "foo" does not exist'. I think the syntax you need is (stat).word etc. See "Field Selection" here: http://www.postgresql.org/docs/8.3/static/sql-expressions.html#AEN1679 The reason for the parens is exactly to distinguish whether the leading word is a table or column name. regards, tom lane
Oleg, This syntax works fine until I also want to get the "sentence_id" column in there as well, so that I can differentiate one set of ts_stat results from another. With the syntax where ts_stat is treated like a table, it isn't possible to run ts_stat separately on multiple tsvectors as I'm doing below. Is there some generic record access syntax that I can use? Thanks, Dan On Dec 9, 2008, at 3:04 PM, Oleg Bartunov wrote: > try select * from ts_stat(....) > btw, performance of ts_stat() was greatly improved in 8.4. > > Oleg > On Tue, 9 Dec 2008, Dan Chak wrote: > >> Dear Postgres Folk, >> >> In working with tsvectors (added in 8.3), I've come to a place >> where my syntax-fu has failed me. I've resorted to turning a >> result set of records into strings so that I can regexp out the >> record fields I need, rather than access them directly, as I'm sure >> it's possible to do with the right syntactic formulation. Although >> my solution works, I'm sure it's much less efficient than it could >> be, and hope someone on the list can help do this the right way. >> >> Basically, I would like to transpose a series of tsvectors (one per >> row) into columns. E.g., as tsvects, I have this: >> >> test=# select * from tsvects; >> sentence_id | tsvect >> -------------+------------------------------ >> 1 | 'fox':3 'brown':2 'quick':1 >> 2 | 'lazi':1 'eleph':3 'green':2 >> >> Instead I want this: >> >> sentence_id | word | freq >> -------------+-------+------ >> 1 | fox | 1 >> 1 | brown | 1 >> 1 | quick | 1 >> 2 | lazi | 1 >> 2 | eleph | 1 >> 2 | green | 1 >> >> I am able to generate this with the following view, but the problem >> is that to create it, I must first cast the ts_stat results to a >> string, and then regexp out the pertinent pieces: >> >> create or replace view words as >> select sentence_id, >> substring(stat from '^\\(([^,]+),') as word, >> substring(stat from ',([^,]+)\\)$') as freq >> from (select sentence_id, >> ts_stat('select tsvect from tsvects where sentence_id = >> ' || >> tsvects.sentence_id)::text as stat >> from tsvects >> ) as foo; >> >> It seems like there should be a way to access fields in the records >> returned from ts_stat directly, but I can't figure out how. Here's >> the result of the subquery: >> >> test=# select sentence_id, >> ts_stat('select tsvect from tsvects where sentence_id = >> ' || >> tsvects.sentence_id)::text as stat >> from tsvects; >> sentence_id | stat >> -------------+------------- >> 1 | (fox,1,1) >> 1 | (brown,1,1) >> 1 | (quick,1,1) >> 2 | (lazi,1,1) >> 2 | (eleph,1,1) >> 2 | (green,1,1) >> (6 rows) >> >> If I try to get at the elements (which I believe are named 'word', >> 'ndoc', 'nentry'), I get a variety of syntax errors: >> >> test=# select sentence_id, >> test-# stat['word'], >> test-# stat['nentry'] >> test-# from (select sentence_id, >> test(# ts_stat('select tsvect from tsvects where >> sentence_id = ' || >> test(# tsvects.sentence_id) as stat >> test(# from tsvects >> test(# ) as foo; >> ERROR: cannot subscript type record because it is not an array >> >> If I say stat.word (instead of subscripting), I get 'missing FROM- >> clause entry for table "stat"'. If I say foo.stat.word, I get >> 'ERROR: schema "foo" does not exist'. >> >> Any ideas on how to get into these records with resorting to text >> parsing? >> >> Thanks, >> Dan >> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83
That works perfectly! Thanks, Dan On Dec 9, 2008, at 3:13 PM, Tom Lane wrote: > Dan Chak <chak@MIT.EDU> writes: >> If I say stat.word (instead of subscripting), I get 'missing FROM- >> clause entry for table "stat"'. If I say foo.stat.word, I get >> 'ERROR: schema "foo" does not exist'. > > I think the syntax you need is (stat).word etc. See "Field Selection" > here: > http://www.postgresql.org/docs/8.3/static/sql-expressions.html#AEN1679 > The reason for the parens is exactly to distinguish whether the > leading > word is a table or column name. > > regards, tom lane
On Tue, 9 Dec 2008, Dan Chak wrote: > Oleg, > > This syntax works fine until I also want to get the "sentence_id" column in > there as well, so that I can differentiate one set of ts_stat results from > another. With the syntax where ts_stat is treated like a table, it isn't > possible to run ts_stat separately on multiple tsvectors as I'm doing below. > > Is there some generic record access syntax that I can use? write function > > Thanks, > Dan > > On Dec 9, 2008, at 3:04 PM, Oleg Bartunov wrote: > >> try select * from ts_stat(....) >> btw, performance of ts_stat() was greatly improved in 8.4. >> >> Oleg >> On Tue, 9 Dec 2008, Dan Chak wrote: >> >>> Dear Postgres Folk, >>> >>> In working with tsvectors (added in 8.3), I've come to a place where my >>> syntax-fu has failed me. I've resorted to turning a result set of records >>> into strings so that I can regexp out the record fields I need, rather >>> than access them directly, as I'm sure it's possible to do with the right >>> syntactic formulation. Although my solution works, I'm sure it's much >>> less efficient than it could be, and hope someone on the list can help do >>> this the right way. >>> >>> Basically, I would like to transpose a series of tsvectors (one per row) >>> into columns. E.g., as tsvects, I have this: >>> >>> test=# select * from tsvects; >>> sentence_id | tsvect >>> -------------+------------------------------ >>> 1 | 'fox':3 'brown':2 'quick':1 >>> 2 | 'lazi':1 'eleph':3 'green':2 >>> >>> Instead I want this: >>> >>> sentence_id | word | freq >>> -------------+-------+------ >>> 1 | fox | 1 >>> 1 | brown | 1 >>> 1 | quick | 1 >>> 2 | lazi | 1 >>> 2 | eleph | 1 >>> 2 | green | 1 >>> >>> I am able to generate this with the following view, but the problem is >>> that to create it, I must first cast the ts_stat results to a string, and >>> then regexp out the pertinent pieces: >>> >>> create or replace view words as >>> select sentence_id, >>> substring(stat from '^\\(([^,]+),') as word, >>> substring(stat from ',([^,]+)\\)$') as freq >>> from (select sentence_id, >>> ts_stat('select tsvect from tsvects where sentence_id = ' || >>> tsvects.sentence_id)::text as stat >>> from tsvects >>> ) as foo; >>> >>> It seems like there should be a way to access fields in the records >>> returned from ts_stat directly, but I can't figure out how. Here's the >>> result of the subquery: >>> >>> test=# select sentence_id, >>> ts_stat('select tsvect from tsvects where sentence_id = ' || >>> tsvects.sentence_id)::text as stat >>> from tsvects; >>> sentence_id | stat >>> -------------+------------- >>> 1 | (fox,1,1) >>> 1 | (brown,1,1) >>> 1 | (quick,1,1) >>> 2 | (lazi,1,1) >>> 2 | (eleph,1,1) >>> 2 | (green,1,1) >>> (6 rows) >>> >>> If I try to get at the elements (which I believe are named 'word', 'ndoc', >>> 'nentry'), I get a variety of syntax errors: >>> >>> test=# select sentence_id, >>> test-# stat['word'], >>> test-# stat['nentry'] >>> test-# from (select sentence_id, >>> test(# ts_stat('select tsvect from tsvects where >>> sentence_id = ' || >>> test(# tsvects.sentence_id) as stat >>> test(# from tsvects >>> test(# ) as foo; >>> ERROR: cannot subscript type record because it is not an array >>> >>> If I say stat.word (instead of subscripting), I get 'missing FROM-clause >>> entry for table "stat"'. If I say foo.stat.word, I get 'ERROR: schema >>> "foo" does not exist'. >>> >>> Any ideas on how to get into these records with resorting to text parsing? >>> >>> Thanks, >>> Dan >>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
ok, here is a function ( credits to Teodor ) CREATE OR REPLACE FUNCTION ts_stat(tsvector, OUT word text, OUT ndoc integer, OUT nentry integer) RETURNS SETOF record AS $$ SELECT ts_stat('SELECT ' || quote_literal( $1::text ) || '::tsvector'); $$ LANGUAGE SQL RETURNS NULL ON NULL INPUT IMMUTABLE; use it like select id, (ts_stat(fts)).* from apod where id=1; Oleg On Tue, 9 Dec 2008, Oleg Bartunov wrote: > On Tue, 9 Dec 2008, Dan Chak wrote: > >> Oleg, >> >> This syntax works fine until I also want to get the "sentence_id" column in >> there as well, so that I can differentiate one set of ts_stat results from >> another. With the syntax where ts_stat is treated like a table, it isn't >> possible to run ts_stat separately on multiple tsvectors as I'm doing >> below. >> >> Is there some generic record access syntax that I can use? > > write function > >> >> Thanks, >> Dan >> >> On Dec 9, 2008, at 3:04 PM, Oleg Bartunov wrote: >> >>> try select * from ts_stat(....) >>> btw, performance of ts_stat() was greatly improved in 8.4. >>> >>> Oleg >>> On Tue, 9 Dec 2008, Dan Chak wrote: >>> >>>> Dear Postgres Folk, >>>> >>>> In working with tsvectors (added in 8.3), I've come to a place where my >>>> syntax-fu has failed me. I've resorted to turning a result set of >>>> records into strings so that I can regexp out the record fields I need, >>>> rather than access them directly, as I'm sure it's possible to do with >>>> the right syntactic formulation. Although my solution works, I'm sure >>>> it's much less efficient than it could be, and hope someone on the list >>>> can help do this the right way. >>>> >>>> Basically, I would like to transpose a series of tsvectors (one per row) >>>> into columns. E.g., as tsvects, I have this: >>>> >>>> test=# select * from tsvects; >>>> sentence_id | tsvect >>>> -------------+------------------------------ >>>> 1 | 'fox':3 'brown':2 'quick':1 >>>> 2 | 'lazi':1 'eleph':3 'green':2 >>>> >>>> Instead I want this: >>>> >>>> sentence_id | word | freq >>>> -------------+-------+------ >>>> 1 | fox | 1 >>>> 1 | brown | 1 >>>> 1 | quick | 1 >>>> 2 | lazi | 1 >>>> 2 | eleph | 1 >>>> 2 | green | 1 >>>> >>>> I am able to generate this with the following view, but the problem is >>>> that to create it, I must first cast the ts_stat results to a string, and >>>> then regexp out the pertinent pieces: >>>> >>>> create or replace view words as >>>> select sentence_id, >>>> substring(stat from '^\\(([^,]+),') as word, >>>> substring(stat from ',([^,]+)\\)$') as freq >>>> from (select sentence_id, >>>> ts_stat('select tsvect from tsvects where sentence_id = ' || >>>> tsvects.sentence_id)::text as stat >>>> from tsvects >>>> ) as foo; >>>> >>>> It seems like there should be a way to access fields in the records >>>> returned from ts_stat directly, but I can't figure out how. Here's the >>>> result of the subquery: >>>> >>>> test=# select sentence_id, >>>> ts_stat('select tsvect from tsvects where sentence_id = ' || >>>> tsvects.sentence_id)::text as stat >>>> from tsvects; >>>> sentence_id | stat >>>> -------------+------------- >>>> 1 | (fox,1,1) >>>> 1 | (brown,1,1) >>>> 1 | (quick,1,1) >>>> 2 | (lazi,1,1) >>>> 2 | (eleph,1,1) >>>> 2 | (green,1,1) >>>> (6 rows) >>>> >>>> If I try to get at the elements (which I believe are named 'word', >>>> 'ndoc', 'nentry'), I get a variety of syntax errors: >>>> >>>> test=# select sentence_id, >>>> test-# stat['word'], >>>> test-# stat['nentry'] >>>> test-# from (select sentence_id, >>>> test(# ts_stat('select tsvect from tsvects where >>>> sentence_id = ' || >>>> test(# tsvects.sentence_id) as stat >>>> test(# from tsvects >>>> test(# ) as foo; >>>> ERROR: cannot subscript type record because it is not an array >>>> >>>> If I say stat.word (instead of subscripting), I get 'missing FROM-clause >>>> entry for table "stat"'. If I say foo.stat.word, I get 'ERROR: schema >>>> "foo" does not exist'. >>>> >>>> Any ideas on how to get into these records with resorting to text >>>> parsing? >>>> >>>> Thanks, >>>> Dan >>>> >>>> >>> >>> Regards, >>> Oleg >>> _____________________________________________________________ >>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >>> Sternberg Astronomical Institute, Moscow University, Russia >>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>> phone: +007(495)939-16-83, +007(495)939-23-83 >> >> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83