Обсуждение: C function migration from 9.2 to 9.5

Поиск
Список
Период
Сортировка

C function migration from 9.2 to 9.5

От
Michael Omotayo Akinde
Дата:
Hi,

We've been having a Postgresql database with some custom C functionality happily running for many years now. It's been running on 9.2, and we wish to upgrade this to the latest version. However, we're seeing some issues with the database process crashing each time.

A simplified, minimal example of the stuff that seems to get us into trouble:

SQL definitions:
----

CREATE TYPE sessionData AS ( a int4, b int4, c int4 ); 

CREATE OR REPLACE FUNCTION 
__WCI_SCHEMA__.getSessionData
()
RETURNS sessionData AS
'db_libdir/db_lib', 'session_get'
LANGUAGE 'C' STABLE;

----
Function definition:
----

PG_FUNCTION_INFO_V1 (session_get);
Datum session_get(PG_FUNCTION_ARGS) {
  Datum ret = packSessionData( 0, 0, 0, fcinfo);
  return ret;
}

Datum packSessionData( int a, int b, int c, FunctionCallInfo fcinfo )
{
    TupleDesc td;
    if ( get_call_result_type( fcinfo, NULL, & td ) != TYPEFUNC_COMPOSITE )
    {
        ereport( ERROR,
                 ( errcode( ERRCODE_DATA_EXCEPTION ),
                   errmsg( "\'packSessionData\': Function returning record called in context that cannot accept type record" ) ) );
    }
    td = BlessTupleDesc( td );

    Datum * ret = ( Datum * ) palloc( 3 * sizeof( Datum ) );
    bool isNull[ 3 ] = {false, false, false};

    ret[ 0 ] = Int32GetDatum( a );
    ret[ 1 ] = Int32GetDatum( b );
    ret[ 2 ] = Int32GetDatum( c );

    HeapTuple ht = ( HeapTuple ) heap_form_tuple( td, ret, isNull );
    return HeapTupleGetDatum( ht );
}

----

Running "select getSessionData()" with this code crashes the database. Pretty much identical code worked fine in 9.2; it's only from 9.3+ we run into trouble. I'm sure we're overseeing something completely basic that has changed in 9.3, but having trouble seeing what it is. A little help would be appreciated.

Regards,

Michael A.

Re: C function migration from 9.2 to 9.5

От
Pavel Stehule
Дата:
Hi

2016-03-03 15:12 GMT+01:00 Michael Omotayo Akinde <michaeloa@met.no>:
Hi,

We've been having a Postgresql database with some custom C functionality happily running for many years now. It's been running on 9.2, and we wish to upgrade this to the latest version. However, we're seeing some issues with the database process crashing each time.

A simplified, minimal example of the stuff that seems to get us into trouble:

It is strange. I tested your code, and it is working without any problems

compiled by gcc on Linux 64bit

Regards

Pavel
 

SQL definitions:
----

CREATE TYPE sessionData AS ( a int4, b int4, c int4 ); 

CREATE OR REPLACE FUNCTION 
__WCI_SCHEMA__.getSessionData
()
RETURNS sessionData AS
'db_libdir/db_lib', 'session_get'
LANGUAGE 'C' STABLE;

----
Function definition:
----

PG_FUNCTION_INFO_V1 (session_get);
Datum session_get(PG_FUNCTION_ARGS) {
  Datum ret = packSessionData( 0, 0, 0, fcinfo);
  return ret;
}

Datum packSessionData( int a, int b, int c, FunctionCallInfo fcinfo )
{
    TupleDesc td;
    if ( get_call_result_type( fcinfo, NULL, & td ) != TYPEFUNC_COMPOSITE )
    {
        ereport( ERROR,
                 ( errcode( ERRCODE_DATA_EXCEPTION ),
                   errmsg( "\'packSessionData\': Function returning record called in context that cannot accept type record" ) ) );
    }
    td = BlessTupleDesc( td );

    Datum * ret = ( Datum * ) palloc( 3 * sizeof( Datum ) );
    bool isNull[ 3 ] = {false, false, false};

    ret[ 0 ] = Int32GetDatum( a );
    ret[ 1 ] = Int32GetDatum( b );
    ret[ 2 ] = Int32GetDatum( c );

    HeapTuple ht = ( HeapTuple ) heap_form_tuple( td, ret, isNull );
    return HeapTupleGetDatum( ht );
}

----

Running "select getSessionData()" with this code crashes the database. Pretty much identical code worked fine in 9.2; it's only from 9.3+ we run into trouble. I'm sure we're overseeing something completely basic that has changed in 9.3, but having trouble seeing what it is. A little help would be appreciated.

Regards,

Michael A.


Re: C function migration from 9.2 to 9.5

От
Pavel Stehule
Дата:


2016-03-03 16:06 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

2016-03-03 15:12 GMT+01:00 Michael Omotayo Akinde <michaeloa@met.no>:
Hi,

We've been having a Postgresql database with some custom C functionality happily running for many years now. It's been running on 9.2, and we wish to upgrade this to the latest version. However, we're seeing some issues with the database process crashing each time.

A simplified, minimal example of the stuff that seems to get us into trouble:

It is strange. I tested your code, and it is working without any problems

try to recheck your development environment, maybe you mix new and old libraries, caches, ...
 

compiled by gcc on Linux 64bit

Regards

Pavel
 

Re: C function migration from 9.2 to 9.5

От
Tom Lane
Дата:
Michael Omotayo Akinde <michaeloa@met.no> writes:
> We've been having a Postgresql database with some custom C functionality
> happily running for many years now. It's been running on 9.2, and we wish
> to upgrade this to the latest version. However, we're seeing some issues
> with the database process crashing each time.

Like Pavel, I can't see anything wrong with that code --- it's not quite
according to PG project style, but it certainly looks like it does what
it needs to.  I think he's right to suspect some inconsistency in your
coding environment.  One concrete idea worth considering is that maybe
you are compiling with headers that postdate commit 3f8c8e3c6 et al and
trying to use the code in a database that predates that.  That'd result
in successfully compiling a call to a nonexistent core function, which
might end up as a crash depending on what your dynamic linker does
about it.

What exactly does the crash look like --- anything interesting in the
postmaster log?  (If your logging setup fails to capture postmaster
stderr, now would be a good time to fix that.)  Have you tried to
get a back-trace with gdb?

            regards, tom lane

PS: for reference, this is the patch I'm wondering about:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL9_4_BR [3f8c8e3c6] 2014-05-01 15:19:06 -0400
Branch: REL9_3_STABLE Release: REL9_3_5 [b72e90bc3] 2014-05-01 15:19:10 -0400
Branch: REL9_2_STABLE Release: REL9_2_9 [8c43980a1] 2014-05-01 15:19:14 -0400
Branch: REL9_1_STABLE Release: REL9_1_14 [db1fdc945] 2014-05-01 15:19:17 -0400
Branch: REL9_0_STABLE Release: REL9_0_18 [7a4f114f3] 2014-05-01 15:19:20 -0400
Branch: REL8_4_STABLE Release: REL8_4_22 [70debcf09] 2014-05-01 15:19:23 -0400

    Fix failure to detoast fields in composite elements of structured types.


Re: C function migration from 9.2 to 9.5

От
Michael Omotayo Akinde
Дата:
Thanks for the responses.

I currently have Postgres installed through apt-get, so I don't think it is old libraries/header files. Just to double check, I spun up a "blank" VM with Ubuntu 14.04 (Trusty), which is what we're currently running in our dev environment, and apt-get installed postgresql-9.3 and postgresql-server-dev-9.3 (v 9.3.11). I've tried this with 9.5 (from the PGDG repos) as well, with the same result. 

Copied over the two attached files (test.c and test.sql) and basically ran:

> createdb test

> gcc -I/usr/include/postgresql/9.5/server -fPIC -c test.c

This throws a compiler warning on the cast from heap_form_tuple to HeapTuple, but IIRC it's always done that so not an error? 

> gcc -shared -o test.so test.o

> psql -d test -f test.sql
CREATE TYPE
CREATE FUNCTION
psql:test.sql:9: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:test.sql:9: connection to server was lost

The logs don't seem all that useful:
2016-03-03 23:03:13 UTC STATEMENT:  CREATE TYPE sessionType AS ( a int4, b int4, c int4);
2016-03-03 23:03:13 UTC LOG:  server process (PID 9988) was terminated by signal 11: Segmentation fault
2016-03-03 23:03:13 UTC DETAIL:  Failed process was running: SELECT sessionData();
2016-03-03 23:03:13 UTC LOG:  terminating any other active server processes
2016-03-03 23:03:13 UTC WARNING:  terminating connection because of crash of another server process
2016-03-03 23:03:13 UTC DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2016-03-03 23:03:13 UTC HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2016-03-03 23:03:13 UTC LOG:  all server processes terminated; reinitializing

I haven't attempted to run the stack backtrace with gdb yet (requires a bit more setup), but that will be the next step if you guys can't see any obvious mistakes that I've made.

Regards,

Michael A.

On Thu, Mar 3, 2016 at 5:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Omotayo Akinde <michaeloa@met.no> writes:
> We've been having a Postgresql database with some custom C functionality
> happily running for many years now. It's been running on 9.2, and we wish
> to upgrade this to the latest version. However, we're seeing some issues
> with the database process crashing each time.

Like Pavel, I can't see anything wrong with that code --- it's not quite
according to PG project style, but it certainly looks like it does what
it needs to.  I think he's right to suspect some inconsistency in your
coding environment.  One concrete idea worth considering is that maybe
you are compiling with headers that postdate commit 3f8c8e3c6 et al and
trying to use the code in a database that predates that.  That'd result
in successfully compiling a call to a nonexistent core function, which
might end up as a crash depending on what your dynamic linker does
about it.

What exactly does the crash look like --- anything interesting in the
postmaster log?  (If your logging setup fails to capture postmaster
stderr, now would be a good time to fix that.)  Have you tried to
get a back-trace with gdb?

                        regards, tom lane

PS: for reference, this is the patch I'm wondering about:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL9_4_BR [3f8c8e3c6] 2014-05-01 15:19:06 -0400
Branch: REL9_3_STABLE Release: REL9_3_5 [b72e90bc3] 2014-05-01 15:19:10 -0400
Branch: REL9_2_STABLE Release: REL9_2_9 [8c43980a1] 2014-05-01 15:19:14 -0400
Branch: REL9_1_STABLE Release: REL9_1_14 [db1fdc945] 2014-05-01 15:19:17 -0400
Branch: REL9_0_STABLE Release: REL9_0_18 [7a4f114f3] 2014-05-01 15:19:20 -0400
Branch: REL8_4_STABLE Release: REL8_4_22 [70debcf09] 2014-05-01 15:19:23 -0400

    Fix failure to detoast fields in composite elements of structured types.

Вложения

Re: C function migration from 9.2 to 9.5

От
Tom Lane
Дата:
Michael Omotayo Akinde <michaeloa@met.no> writes:
> This throws a compiler warning on the cast from heap_form_tuple to
> HeapTuple, but IIRC it's always done that so not an error?

Uh, that's *awfully* fishy, because heap_form_tuple certainly returns
HeapTuple.  I wondered why you had that cast there; it should not be
necessary.  If the compiler is warning about it, that sets off all kinds
of alarm bells.

I notice your test program fails to #include "access/htup_details.h",
which is where heap_form_tuple() is declared these days.  I wonder if
your problem boils down to "no visible declaration of function, so
compiler thinks it returns int"?  In that case the real difference
from what worked to what didn't probably had less to do with any PG
version change and more to do with moving from 32-bit to 64-bit.
(Or I guess we might've relocated the declaration of heap_form_tuple
somewhere along the line.)

A general tip for getting C code to work is to turn on as many
compiler warnings as you can, and never ignore any of them.

            regards, tom lane


Re: C function migration from 9.2 to 9.5

От
Michael Omotayo Akinde
Дата:
Nailed it. Ensuring heap_form_tuple was properly defined resolved the issue.

Thanks a lot. Without your help, I would probably have spent more time looking at the 32 <=> 64 bit thing first, since I was pretty certain that we had this warning in the old compile. I suspect I am wrong about that, though, and that the actual problem is the heap_form_tuple declaration being relocated, as you suggest. In either case, going through the code and ensuring that we had that include everywhere in the code seems to have cleared up the problems (all tests running OK).

Regards,

Michael A.

On Fri, Mar 4, 2016 at 1:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Omotayo Akinde <michaeloa@met.no> writes:
> This throws a compiler warning on the cast from heap_form_tuple to
> HeapTuple, but IIRC it's always done that so not an error?

Uh, that's *awfully* fishy, because heap_form_tuple certainly returns
HeapTuple.  I wondered why you had that cast there; it should not be
necessary.  If the compiler is warning about it, that sets off all kinds
of alarm bells.

I notice your test program fails to #include "access/htup_details.h",
which is where heap_form_tuple() is declared these days.  I wonder if
your problem boils down to "no visible declaration of function, so
compiler thinks it returns int"?  In that case the real difference
from what worked to what didn't probably had less to do with any PG
version change and more to do with moving from 32-bit to 64-bit.
(Or I guess we might've relocated the declaration of heap_form_tuple
somewhere along the line.)

A general tip for getting C code to work is to turn on as many
compiler warnings as you can, and never ignore any of them.

                        regards, tom lane