Обсуждение: question about window function in C

Поиск
Список
Период
Сортировка

question about window function in C

От
Dan S
Дата:
Hi !

I'm trying to write a window function in C .

In the function I'm using a tuplesort to sort the window data and then do some processing.
Now when I'm running the function I get this: 'WARNING:  temporary file leak: File 43 still referenced'
The warning comes from my failure to call tuplesort_end at the appropriate time.
From the beginning I thought that calling tuplesort_end when current_pos in the window approched WinGetPartitionRowCount would do it.
That seemed to work at first but when I put a limit-clause in the query that doesn't work.
So I think I need to use some callback mechanism to get that to work.
I've found some reference in the postgres source code to RegisterExprContextCallback which seemed promising but I have no idea how to get the right expression context to put in as the first argument to that function.

So my question is, how do I shut down the tuplesort properly after the last call to my window function ?

I'm running "PostgreSQL 9.3.5 on i686-pc-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 32-bit"

Best Regards
Dan S

Re: question about window function in C

От
Tom Lane
Дата:
Dan S <strd911@gmail.com> writes:
> I'm trying to write a window function in C .
> In the function I'm using a tuplesort to sort the window data and then do
> some processing.

Hmm ... why do you feel you need to do that?  The window function's input
should already be sorted according to the window specification.

> Now when I'm running the function I get this: 'WARNING:  temporary file
> leak: File 43 still referenced'
> The warning comes from my failure to call tuplesort_end at the appropriate
> time.

Unsurprising.

> So I think I need to use some callback mechanism to get that to work.
> I've found some reference in the postgres source code to
> RegisterExprContextCallback which seemed promising but I have no idea how
> to get the right expression context to put in as the first argument to that
> function.

I don't think there is one :-(.  WindowAgg has a per-input-tuple econtext,
and a per-output-tuple econtext, but what you'd need for this is a
partition-lifespan econtext, which doesn't exist.

It's possible that we could promote the "partcontext" memory context into
a full econtext so as to support this sort of behavior.  But I'd want to
see a reasonably convincing use-case for it.

            regards, tom lane


Re: question about window function in C

От
Dan S
Дата:
Well I'm trying to implement a window-function that works on range_types and produces 'atomic ranges' for each input range.
Let's say I have a set of ranges some overlapping some not, and I want to split each range at every boundary of every overlapping range and return those.
So for each range r I want to return an array of ranges that consists of range r split at every overlapping range boundary.
I need to consider both upper and lower boundaries and to be able to do this in one pass over the data I need to sort both the upper and lower boundaries and also sort the original ranges so I can loop over the ranges and boundaries in lockstep to produce the arrays.
As a last step I sort back the arrays of atomic ranges in the original order in the window so I can read out the tuplesort in that order and return each array to its corresponding range r.

(The result can be used to answer questions like what are the maximum number of simultaneously overlapping ranges and at which ranges the maximum occurs)

Best Regards
Dan S



2014-12-26 18:57 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Dan S <strd911@gmail.com> writes:
> I'm trying to write a window function in C .
> In the function I'm using a tuplesort to sort the window data and then do
> some processing.

Hmm ... why do you feel you need to do that?  The window function's input
should already be sorted according to the window specification.

> Now when I'm running the function I get this: 'WARNING:  temporary file
> leak: File 43 still referenced'
> The warning comes from my failure to call tuplesort_end at the appropriate
> time.

Unsurprising.

> So I think I need to use some callback mechanism to get that to work.
> I've found some reference in the postgres source code to
> RegisterExprContextCallback which seemed promising but I have no idea how
> to get the right expression context to put in as the first argument to that
> function.

I don't think there is one :-(.  WindowAgg has a per-input-tuple econtext,
and a per-output-tuple econtext, but what you'd need for this is a
partition-lifespan econtext, which doesn't exist.

It's possible that we could promote the "partcontext" memory context into
a full econtext so as to support this sort of behavior.  But I'd want to
see a reasonably convincing use-case for it.

                        regards, tom lane

Re: question about window function in C

От
Merlin Moncure
Дата:
On Fri, Dec 26, 2014 at 1:19 PM, Dan S <strd911@gmail.com> wrote:
> Well I'm trying to implement a window-function that works on range_types and
> produces 'atomic ranges' for each input range.
> Let's say I have a set of ranges some overlapping some not, and I want to
> split each range at every boundary of every overlapping range and return
> those.
> So for each range r I want to return an array of ranges that consists of
> range r split at every overlapping range boundary.
> I need to consider both upper and lower boundaries and to be able to do this
> in one pass over the data I need to sort both the upper and lower boundaries
> and also sort the original ranges so I can loop over the ranges and
> boundaries in lockstep to produce the arrays.
> As a last step I sort back the arrays of atomic ranges in the original order
> in the window so I can read out the tuplesort in that order and return each
> array to its corresponding range r.
>
> (The result can be used to answer questions like what are the maximum number
> of simultaneously overlapping ranges and at which ranges the maximum occurs)
>
> Best Regards
> Dan S
>
>
>
> 2014-12-26 18:57 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
>>
>> Dan S <strd911@gmail.com> writes:
>> > I'm trying to write a window function in C .
>> > In the function I'm using a tuplesort to sort the window data and then
>> > do
>> > some processing.
>>
>> Hmm ... why do you feel you need to do that?  The window function's input
>> should already be sorted according to the window specification.
>>
>> > Now when I'm running the function I get this: 'WARNING:  temporary file
>> > leak: File 43 still referenced'
>> > The warning comes from my failure to call tuplesort_end at the
>> > appropriate
>> > time.
>>
>> Unsurprising.
>>
>> > So I think I need to use some callback mechanism to get that to work.
>> > I've found some reference in the postgres source code to
>> > RegisterExprContextCallback which seemed promising but I have no idea
>> > how
>> > to get the right expression context to put in as the first argument to
>> > that
>> > function.
>>
>> I don't think there is one :-(.  WindowAgg has a per-input-tuple econtext,
>> and a per-output-tuple econtext, but what you'd need for this is a
>> partition-lifespan econtext, which doesn't exist.
>>
>> It's possible that we could promote the "partcontext" memory context into
>> a full econtext so as to support this sort of behavior.  But I'd want to
>> see a reasonably convincing use-case for it.
>>
>>                         regards, tom lane
>
>


Re: question about window function in C

От
Merlin Moncure
Дата:
On Fri, Dec 26, 2014 at 11:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Dec 26, 2014 at 1:19 PM, Dan S <strd911@gmail.com> wrote:
>> Well I'm trying to implement a window-function that works on range_types and
>> produces 'atomic ranges' for each input range.
>> Let's say I have a set of ranges some overlapping some not, and I want to
>> split each range at every boundary of every overlapping range and return
>> those.
>> So for each range r I want to return an array of ranges that consists of
>> range r split at every overlapping range boundary.
>> I need to consider both upper and lower boundaries and to be able to do this
>> in one pass over the data I need to sort both the upper and lower boundaries
>> and also sort the original ranges so I can loop over the ranges and
>> boundaries in lockstep to produce the arrays.
>> As a last step I sort back the arrays of atomic ranges in the original order
>> in the window so I can read out the tuplesort in that order and return each
>> array to its corresponding range r.
>>
>> (The result can be used to answer questions like what are the maximum number
>> of simultaneously overlapping ranges and at which ranges the maximum occurs)

shoot -- sorry for empty mail -- misclick.

anyways, can you give some precise examples of what you want to occur?
 for example, a set of sql taking actual inputs and the output you
would like to see...

merlin


Re: question about window function in C

От
Dan S
Дата:

2014-12-27 6:43 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:
On Fri, Dec 26, 2014 at 11:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Dec 26, 2014 at 1:19 PM, Dan S <strd911@gmail.com> wrote:
>> Well I'm trying to implement a window-function that works on range_types and
>> produces 'atomic ranges' for each input range.
>> Let's say I have a set of ranges some overlapping some not, and I want to
>> split each range at every boundary of every overlapping range and return
>> those.
>> So for each range r I want to return an array of ranges that consists of
>> range r split at every overlapping range boundary.
>> I need to consider both upper and lower boundaries and to be able to do this
>> in one pass over the data I need to sort both the upper and lower boundaries
>> and also sort the original ranges so I can loop over the ranges and
>> boundaries in lockstep to produce the arrays.
>> As a last step I sort back the arrays of atomic ranges in the original order
>> in the window so I can read out the tuplesort in that order and return each
>> array to its corresponding range r.
>>
>> (The result can be used to answer questions like what are the maximum number
>> of simultaneously overlapping ranges and at which ranges the maximum occurs)

shoot -- sorry for empty mail -- misclick.

anyways, can you give some precise examples of what you want to occur?
 for example, a set of sql taking actual inputs and the output you
would like to see...

merlin

Say I have theses ranges (in a table or as output of a sql statement):

[2000-01-01,2000-05-01)
[2000-02-01,2003-07-01)
[2002-06-01,2003-03-01)
[2003-04-01,2004-08-01)

then for the first range [2000-01-01,2000-05-01) I want the array output :
{[2000-01-01,2000-02-01),[2000-02-01,2000-05-01)}

for the next range [2000-02-01,2003-07-01) I want the array output:
{[2000-02-01,2000-05-01),[2000-05-01,2002-06-01),[2002-06-01,2003-03-01),[2003-03-01,2003-04-01),[2003-04-01,2003-07-01)}

and for the third range [2002-06-01,2003-03-01) I want the output:
{[2002-06-01,2003-03-01)}

and for the last range [2003-04-01,2004-08-01)  the output should be:
{[2003-04-01,2003-07-01),[2003-07-01,2004-08-01)}

As you can see each input range is split up at each boundary where it is overlapped by the other ranges in the set.
I have written a window function which does this.
On the first call into the window function it calculates all the split up ranges and stores the resulting arrays into a tuplesort and sort these back into the original order in the window function.
On subsequent calls into the window function it pulls one array of ranges at a time from the tuplesort and returns that as output.
Then on the last call into the window function I need to release the tuplesort so to not leak resources.

The problem is that I can't determine which is the last call into the function so I can't release the tuplesort at the appropriate time.
I need a way to determine when to release the tuplesort, for example a way to register a callback at appropriate time.

Sql wise the table can look like this and say I have the 4 ranges above in the table.

create table t1 (id int, dr daterange);

and the query with my window function: select id, dr, window_range_split(dr) over () from t1
this query will exhaust the whole window and I can determine when to release the tuplesort, I release it when I pull out the last tuple in the window.

However this query won't work: select id, dr, window_range_split(dr) over () from t1 limit 2
the number of tuples in the window is 4 and I have no idea that there are only going to be 2 calls into the window function and so I fail release the tuplesort on the second call.

I should also mention that it's not specifically dateranges I'm targeting. the window function takes anyrange as argument type and returns anyarray.

Best Regards
Dan S