Обсуждение: BUG #10888: application is getting hanged in the poll() function of libpq.so.

Поиск
Список
Период
Сортировка

BUG #10888: application is getting hanged in the poll() function of libpq.so.

От
mitu.verma@ericsson.com
Дата:
The following bug has been logged on the website:

Bug reference:      10888
Logged by:          Mitu
Email address:      mitu.verma@ericsson.com
PostgreSQL version: 9.1.3
Operating system:   linux
Description:

Application gets hanged while making a query to the PostgreSql database.

I reproduced the issues on our testbed by making a simple C program which
makes a synchronous connection with the DB.

Pstack is showing the following error when the application is hang (see
below)

Now as it looks that it is getting hanged in the poll() function of
libpq.so.
There are other queries which are running successful on the same server. I
am clueless on how to get it resolved.
I have been advised to make asynchronous connection with the DB for this.
It will be really helpful if you can provide me some inputs on how to solve
this.

>ps -eaf | grep emitver | grep a.out
emitver   7918 26252  0 12:39 pts/21   00:00:00 ./a.out
emitver   9348 22957  0 12:54 pts/20   00:00:00 grep a.out

%sekax144:/workarea/MM/gerrit_clones/emitver/HS64570

>pstack 7918
#0  0x00000031c82dd1a8 in poll () from /lib64/libc.so.6
#1  0x00007f7f520596ef in pqSocketCheck () from
/opt/PostgreSQL/9.1/lib/libpq.so.5
#2  0x00007f7f520597c0 in pqWaitTimed () from
/opt/PostgreSQL/9.1/lib/libpq.so.5
#3  0x00007f7f52058d8e in PQgetResult () from
/opt/PostgreSQL/9.1/lib/libpq.so.5
#4  0x00007f7f52058f5c in PQexecFinish () from
/opt/PostgreSQL/9.1/lib/libpq.so.5
#5  0x0000000000400f77 in main

Re: BUG #10888: application is getting hanged in the poll() function of libpq.so.

От
Tom Lane
Дата:
mitu.verma@ericsson.com writes:
> Application gets hanged while making a query to the PostgreSql database.

> I reproduced the issues on our testbed by making a simple C program which
> makes a synchronous connection with the DB.

> Pstack is showing the following error when the application is hang (see
> below)

> Now as it looks that it is getting hanged in the poll() function of
> libpq.so.

That looks like it's just waiting for the server to respond.  You sure
you didn't simply issue a very long-running query?  Check to see if the
connected backend process is accumulating CPU time and/or I/O activity.

            regards, tom lane

Re: BUG #10888: application is getting hanged in the poll() function of libpq.so.

От
Tom Lane
Дата:
Mitu Verma <mitu.verma@ericsson.com> writes:
> I am running a single query at this moment.

> INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex  ) VALUES
(E'Server',E'Start', E'bgw', E'20140512122404', NULL, NULL, NULL, 539  ); 

> So not able to understand whats wrong with the database that it is unable to process this simple insert query.
> At the same time if drop the index and recreate the table then this issue is resolved. But obviously this is not a
feasiblesolution in live environment. Plz see if you can provide some help. 

>> pstack 16950
> #0  0x00000000006dc1da in hash_search ()
> #1  0x000000000060e430 in LockAcquireExtended ()
> #2  0x000000000060c46e in XactLockTableWait ()
> #3  0x0000000000476bc6 in _bt_doinsert ()
> #4  0x0000000000479882 in btinsert ()

It'd appear that the INSERT is blocked because it's waiting to see if some
other transaction that has inserted a conflicting key value into a unique
index will commit or not.  You could probably learn more about what's
going on by looking into pg_locks and pg_stat_activity.  But at this point
I see no reason whatsoever to think there's a Postgres bug involved.

            regards, tom lane

Re: BUG #10888: application is getting hanged in the poll() function of libpq.so.

От
Mitu Verma
Дата:
Hi Tom,=20

Thanks for your response. But at server side it is showing the following be=
havior .
I am running a single query at this moment.

INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser, host, a=
pplication, tableIndex  ) VALUES (E'Server', E'Start', E'bgw', E'2014051212=
2404', NULL, NULL, NULL, 539  );

So not able to understand whats wrong with the database that it is unable t=
o process this simple insert query.
At the same time if drop the index and recreate the table then this issue i=
s resolved. But obviously this is not a feasible solution in live environme=
nt. Plz see if you can provide some help.

>pstack 16950
#0  0x00000000006dc1da in hash_search ()
#1  0x000000000060e430 in LockAcquireExtended ()
#2  0x000000000060c46e in XactLockTableWait ()
#3  0x0000000000476bc6 in _bt_doinsert ()
#4  0x0000000000479882 in btinsert ()
#5  0x00000000006d6d25 in FunctionCall6Coll ()
#6  0x00000000004743c7 in index_insert ()
#7  0x0000000000563375 in ExecInsertIndexTuples ()
#8  0x000000000056eba3 in ExecModifyTable ()
#9  0x000000000055a863 in ExecProcNode ()
#10 0x0000000000559871 in standard_ExecutorRun ()
#11 0x000000000062037f in ProcessQuery ()
#12 0x00000000006205a7 in PortalRunMulti ()
#13 0x0000000000621068 in PortalRun ()
#14 0x000000000061dc5b in exec_simple_query ()
#15 0x000000000061e464 in PostgresMain ()
#16 0x00000000005e38dc in ServerLoop ()
#17 0x00000000005e45dc in PostmasterMain ()
#18 0x0000000000588dbe in main ()

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Monday, July 07, 2014 8:24 PM
To: Mitu Verma
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #10888: application is getting hanged in the poll()=
 function of libpq.so.

mitu.verma@ericsson.com writes:
> Application gets hanged while making a query to the PostgreSql database.

> I reproduced the issues on our testbed by making a simple C program=20
> which makes a synchronous connection with the DB.

> Pstack is showing the following error when the application is hang=20
> (see
> below)

> Now as it looks that it is getting hanged in the poll() function of=20
> libpq.so.

That looks like it's just waiting for the server to respond.  You sure you =
didn't simply issue a very long-running query?  Check to see if the connect=
ed backend process is accumulating CPU time and/or I/O activity.

            regards, tom lane

Re: BUG #10888: application is getting hanged in the poll() function of libpq.so.

От
Mitu Verma
Дата:
I am not sure if it is a postgres issue or not, so need your help in that.=
=20
I have checked pg_stat and pg_lock, (see the output below). But is is not s=
howing any conflicts.
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
--
  locktype    | database | relation | page | tuple | virtualxid | transacti=
onid | classid | objid | objsubid | virtualtra
nsaction |  pid  |       mode       | granted
---------------+----------+----------+------+-------+------------+---------=
------+---------+-------+----------+-----------
---------+-------+------------------+---------
 transactionid |          |          |      |       |            |       26=
50178 |         |       |          | 2/2
         | 20714 | ExclusiveLock    | t
 relation      |    16384 |    16501 |      |       |            |         =
      |         |       |          | 2/2
         | 20714 | RowExclusiveLock | t
 virtualxid    |          |          |      |       | 4/41       |         =
      |         |       |          | 4/41
         | 10626 | ExclusiveLock    | t
 relation      |    16384 |    16400 |      |       |            |         =
      |         |       |          | 2/2
         | 20714 | RowExclusiveLock | t
 relation      |    16384 |    11000 |      |       |            |         =
      |         |       |          | 4/41
         | 10626 | AccessShareLock  | t
 relation      |    16384 |    16400 |      |       |            |         =
      |         |       |          | 3/2
         | 20819 | RowExclusiveLock | t
 virtualxid    |          |          |      |       | 2/2        |         =
      |         |       |          | 2/2
         | 20714 | ExclusiveLock    | t
 transactionid |          |          |      |       |            |       26=
50179 |         |       |          | 3/2
         | 20819 | ExclusiveLock    | t
 virtualxid    |          |          |      |       | 3/2        |         =
      |         |       |          | 3/2
         | 20819 | ExclusiveLock    | t
 relation      |    16384 |    16501 |      |       |            |         =
      |         |       |          | 3/2
         | 20819 | RowExclusiveLock | t
(10 rows)


---------------------------------------------------------------------------=
------------------------------------------

fm_db_CollectCCN=3D# select * from pg_stat_activity;
DEBUG:  StartTransactionCommand
DEBUG:  StartTransaction
DEBUG:  name: unnamed; blockState:       DEFAULT; state: INPROGR, xid/subid=
/cid: 0/1/0, nestlvl: 1, children:
DEBUG:  CommitTransactionCommand
DEBUG:  CommitTransaction
DEBUG:  name: unnamed; blockState:       STARTED; state: INPROGR, xid/subid=
/cid: 0/1/0, nestlvl: 1, children:
 datid |     datname      | procpid | usesysid | usename | application_name=
 | client_addr | client_hostname | client_port
|         backend_start         |          xact_start           |          =
query_start          | waiting |
                                                                      curre=
nt_query

-------+------------------+---------+----------+---------+-----------------=
-+-------------+-----------------+-------------
+-------------------------------+-------------------------------+----------=
---------------------+---------+---------------
---------------------------------------------------------------------------=
-----------------------------------------------
----------------------------------------------
 16384 | fm_db_CollectCCN |   20714 |    16391 | mmsuper |                 =
 | ::1         |                 |       19540
| 2014-07-07 19:10:09.295565+02 | 2014-07-07 19:10:09.311701+02 | 2014-07-0=
7 19:10:09.311701+02 | f       | INSERT INTO ev
entLogEntry (object, method, bgwUser, time, realUser, host, application, ta=
bleIndex  ) VALUES (E'Server', E'Start', E'bgw'
, E'20140512122404', NULL, NULL, NULL, 539  )
 16384 | fm_db_CollectCCN |   20819 |    16391 | mmsuper |                 =
 | ::1         |                 |       19602
| 2014-07-07 19:10:45.18688+02  | 2014-07-07 19:10:45.202903+02 | 2014-07-0=
7 19:10:45.202903+02 | f       | INSERT INTO ev
entLogEntry (object, method, bgwUser, time, realUser, host, application, ta=
bleIndex  ) VALUES (E'Server', E'Start', E'bgw'
, E'20140512122404', NULL, NULL, NULL, 539  )
 16384 | fm_db_CollectCCN |   10626 |    16391 | mmsuper | psql.bin        =
 | ::1         |                 |       29388
| 2014-07-07 19:29:13.560262+02 | 2014-07-07 19:32:05.993869+02 | 2014-07-0=
7 19:32:05.993869+02 | f       | select * from
pg_stat_activity;
(3 rows)

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Monday, July 07, 2014 10:57 PM
To: Mitu Verma
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #10888: application is getting hanged in the poll()=
 function of libpq.so.

Mitu Verma <mitu.verma@ericsson.com> writes:
> I am running a single query at this moment.

> INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser,=20
> host, application, tableIndex  ) VALUES (E'Server', E'Start', E'bgw',=20
> E'20140512122404', NULL, NULL, NULL, 539  );

> So not able to understand whats wrong with the database that it is unable=
 to process this simple insert query.
> At the same time if drop the index and recreate the table then this issue=
 is resolved. But obviously this is not a feasible solution in live environ=
ment. Plz see if you can provide some help.

>> pstack 16950
> #0  0x00000000006dc1da in hash_search ()
> #1  0x000000000060e430 in LockAcquireExtended ()
> #2  0x000000000060c46e in XactLockTableWait ()
> #3  0x0000000000476bc6 in _bt_doinsert ()
> #4  0x0000000000479882 in btinsert ()

It'd appear that the INSERT is blocked because it's waiting to see if some =
other transaction that has inserted a conflicting key value into a unique i=
ndex will commit or not.  You could probably learn more about what's going =
on by looking into pg_locks and pg_stat_activity.  But at this point I see =
no reason whatsoever to think there's a Postgres bug involved.

            regards, tom lane

Re: BUG #10888: application is getting hanged in the poll() function of libpq.so.

От
Tom Lane
Дата:
Mitu Verma <mitu.verma@ericsson.com> writes:
>  I am not sure if it is a postgres issue or not, so need your help in that.

Well, it looks like you've got two concurrent transactions trying to
insert the exact same data:

>  16384 | fm_db_CollectCCN |   20714 |    16391 | mmsuper |                  | ::1         |                 |
19540
> | 2014-07-07 19:10:09.295565+02 | 2014-07-07 19:10:09.311701+02 | 2014-07-07 19:10:09.311701+02 | f       | INSERT
INTOev 
> entLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex  ) VALUES (E'Server', E'Start',
E'bgw'
> , E'20140512122404', NULL, NULL, NULL, 539  )
>  16384 | fm_db_CollectCCN |   20819 |    16391 | mmsuper |                  | ::1         |                 |
19602
> | 2014-07-07 19:10:45.18688+02  | 2014-07-07 19:10:45.202903+02 | 2014-07-07 19:10:45.202903+02 | f       | INSERT
INTOev 
> entLogEntry (object, method, bgwUser, time, realUser, host, application, tableIndex  ) VALUES (E'Server', E'Start',
E'bgw'
> , E'20140512122404', NULL, NULL, NULL, 539  )

Presumably this is falling foul of a unique index on the table.
The second one is going to block until the first one either
commits or rolls back.

            regards, tom lane

Re: BUG #10888: application is getting hanged in the poll() function of libpq.so.

От
Mitu Verma
Дата:
Hi Tom,

I have reproduced the issue on our test bed.

When I run a single query to insert the data into table , then the query is=
 getting hanged.

 I debugged this issue by compiling the postgres code and using the databas=
e where the customer is facing problem. (see the gdb below)
Then the coredump is coming with the assertion getting failed for transacti=
on id as 0.

Following are my observations.

1. Even if I run a single query, (i.e no other transaction is in place) , t=
hen also following code is returning valid value of xwait. I checked this u=
sing gdb.(see the gdb below)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
File nbtinsert.c=20

Bool _bt_doinsert(Relation rel, IndexTuple itup,IndexUniqueCheck checkUniqu=
e, Relation heapRel)
{
-----------------------------
-----------------------------
xwait =3D _bt_check_unique(rel, itup, heapRel, buf, offset, itup_scankey ch=
eckUnique, &is_unique);

 /* As per my understanding if I am running a single transaction then tis v=
alue should retun 0, but it seems that this value is being set some where=20
In postgres code.
-----------------------------------
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
now because of the valid value of this xwait, it assumes that there is some=
 other transaction which is already going on and so the
current transaction has to wait till the old transaction is either committe=
d or rolled backed.
So the current transaction waits in infinite loop for the completion of the=
 other transaction( which is not there actually ).
So the current transaction just keeps on waiting forever.

2. I checked inside bt_check_unique and following is the code which is hit =
in our case and which suggests that there is an already existing transactio=
n
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXiXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

/* If this tuple is being updated by other transaction then we have to wait=
 for its commit/abort.*/
xwait =3D (TransactionIdIsValid(SnapshotDirty.xmin)) ?SnapshotDirty.xmin : =
SnapshotDirty.xmax;

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

3. This issue is coming only with that table where table index is defined.

4. Issue is resolved if the index is dropped and we recreate the table.

It looks to me that postgres stores the transaction (SnapshotDirty ?) somew=
here and even if there is no other transaction going on this is=20
Showing some abrupt value which is giving conflicts.

If anyone can help me with this piece of postgresql code then it will be re=
ally helpful.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
(gdb) where
#0  0x00000031c8232a45 in raise () from /lib64/libc.so.6
#1  0x00000031c8234225 in abort () from /lib64/libc.so.6
#2  0x000000000072248d in ExceptionalCondition (conditionName=3D<value opti=
mized out>, errorType=3D<value optimized out>, fileName=3D<value optimized =
out>, lineNumber=3D<value optimized out>)
    at assert.c:57
#3  0x000000000064ee48 in XactLockTableWait (xid=3D0) at lmgr.c:471
#4  0x00000000004837fb in _bt_doinsert (rel=3D0x7f616e9e3dc8, itup=3D0x221a=
b30, checkUnique=3DUNIQUE_CHECK_YES, heapRel=3D0x7f616e9df3a8) at nbtinsert=
.c:169
#5  0x0000000000487f21 in btinsert (fcinfo=3D<value optimized out>) at nbtr=
ee.c:261
#6  0x0000000000728c14 in FunctionCall6Coll (flinfo=3D<value optimized out>=
, collation=3D<value optimized out>, arg1=3D<value optimized out>, arg2=3D<=
value optimized out>, arg3=3D<value optimized out>,
    arg4=3D<value optimized out>, arg5=3D140056444400552, arg6=3D1) at fmgr=
.c:1439
#7  0x000000000047f746 in index_insert (indexRelation=3D0x7f616e9e3dc8, val=
ues=3D0x7fff36fd71f0, isnull=3D0x7fff36fd72f0 "", heap_t_ctid=3D0x221a9dc, =
heapRelation=3D0x7f616e9df3a8,
    checkUnique=3DUNIQUE_CHECK_YES) at indexam.c:215
#8  0x0000000000591c15 in ExecInsertIndexTuples (slot=3D0x2217918, tupleid=
=3D0x221a9dc, estate=3D0x2215a18) at execUtils.c:1086
#9  0x000000000059fab5 in ExecInsert (node=3D0x2219a40) at nodeModifyTable.=
c:247
#10 ExecModifyTable (node=3D0x2219a40) at nodeModifyTable.c:847
#11 0x00000000005878d8 in ExecProcNode (node=3D0x2219a40) at execProcnode.c=
:371
#12 0x0000000000586526 in ExecutePlan (queryDesc=3D0x2215608, direction=3D<=
value optimized out>, count=3D0) at execMain.c:1440
#13 standard_ExecutorRun (queryDesc=3D0x2215608, direction=3D<value optimiz=
ed out>, count=3D0) at execMain.c:314
#14 0x0000000000667abf in ProcessQuery (plan=3D0x21c5920,
    sourceText=3D0x21c3a08 "INSERT INTO eventLogEntry (object, method, bgwU=
ser, time, realUser, host, application, tableIndex ) VALUES ('Server', 'Sta=
rt', 'bgw', '20140512122404', NULL, NULL, NULL, 539);", params=3D0x0, dest=
=3D<value optimized out>, completionTag=3D0x7fff36fd76d0 "") at pquery.c:18=
7
#15 0x0000000000667d65 in PortalRunMulti (portal=3D0x21272e8, isTopLevel=3D=
1 '\001', dest=3D0x220f580, altdest=3D0x220f580, completionTag=3D0x7fff36fd=
76d0 "") at pquery.c:1276
#16 0x0000000000668592 in PortalRun (portal=3D0x21272e8, count=3D9223372036=
854775807, isTopLevel=3D1 '\001', dest=3D0x220f580, altdest=3D0x220f580, co=
mpletionTag=3D0x7fff36fd76d0 "") at pquery.c:813
#17 0x0000000000664949 in exec_simple_query (
    query_string=3D0x21c3a08 "INSERT INTO eventLogEntry (object, method, bg=
wUser, time, realUser, host, application, tableIndex ) VALUES ('Server', 'S=
tart', 'bgw', '20140512122404', NULL, NULL, NULL, 539);") at postgres.c:102=
0
#18 0x0000000000665880 in PostgresMain (argc=3D<value optimized out>, argv=
=3D<value optimized out>, username=3D<value optimized out>) at postgres.c:3=
968
#19 0x000000000061c691 in BackendRun () at postmaster.c:3617
#20 BackendStartup () at postmaster.c:3302
#21 ServerLoop () at postmaster.c:1466
#22 0x000000000061eff2 in PostmasterMain (argc=3D<value optimized out>, arg=
v=3D<value optimized out>) at postmaster.c:1127
#23 0x00000000005b8bd0 in main (argc=3D3, argv=3D0x2118e70) at main.c:199

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXX
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Tuesday, July 08, 2014 12:05 AM
To: Mitu Verma
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #10888: application is getting hanged in the poll()=
 function of libpq.so.

Mitu Verma <mitu.verma@ericsson.com> writes:
>  I am not sure if it is a postgres issue or not, so need your help in tha=
t.=20

Well, it looks like you've got two concurrent transactions trying to insert=
 the exact same data:

>  16384 | fm_db_CollectCCN |   20714 |    16391 | mmsuper |               =
   | ::1         |                 |       19540
> | 2014-07-07 19:10:09.295565+02 | 2014-07-07 19:10:09.311701+02 | 2014-07=
-07 19:10:09.311701+02 | f       | INSERT INTO ev
> entLogEntry (object, method, bgwUser, time, realUser, host, application, =
tableIndex  ) VALUES (E'Server', E'Start', E'bgw'
> , E'20140512122404', NULL, NULL, NULL, 539  )
>  16384 | fm_db_CollectCCN |   20819 |    16391 | mmsuper |               =
   | ::1         |                 |       19602
> | 2014-07-07 19:10:45.18688+02  | 2014-07-07 19:10:45.202903+02 | 2014-07=
-07 19:10:45.202903+02 | f       | INSERT INTO ev
> entLogEntry (object, method, bgwUser, time, realUser, host, application, =
tableIndex  ) VALUES (E'Server', E'Start', E'bgw'
> , E'20140512122404', NULL, NULL, NULL, 539  )

Presumably this is falling foul of a unique index on the table.
The second one is going to block until the first one either commits or roll=
s back.

            regards, tom lane

Re: BUG #10888: application is getting hanged in the poll() function of libpq.so.

От
Mitu Verma
Дата:
Hi tom,

I have few more observations regarding the issue mentioned below -

Few points=20
1. Database dump has been taken from the customer site from where the issue=
 has been reported, we are using postgreSQL 9.1.3

2.Schema of the table is  -
   Column    |         Type          | Modifiers
-------------+-----------------------+-----------
 tableindex  | integer               |
 object      | character varying(80) |
 method      | character varying(80) |
 bgwuser     | character varying(80) |
 time        | character(23)         |
 realuser    | character varying(80) |
 host        | character varying(80) |
 application | character varying(80) |
Indexes:
    "ind1_eventlogentry" UNIQUE, btree (tableindex), tablespace "mmindex"
Tablespace: "mmdata"

3. On the  database which we have taken from customer site  when we were ru=
nning the following query , query is hung.
INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser, host, a=
pplication, tableIndex  ) VALUES (E'Server', E'Start', E'bgw', E'2014051212=
2404', NULL, NULL, NULL, 539  );

4. But when we used index after 586 like, then the insert is happening corr=
ectly on the same database.
INSERT INTO eventLogEntry (object, method, bgwUser, time, realUser, host, a=
pplication, tableIndex  ) VALUES (E'Server', E'Start', E'bgw', E'2014051212=
2404', NULL, NULL, NULL, 587 );

5. When we did debugging , then we found that postgreSQL has stored some ot=
her transaction id and it asks for the current transaction to wait till the=
 time previous transaction
Either gets committed or rolled back.

6. It seems that somehow the indexes are corrupted and postgreSQL has store=
d some transaction id somewhere, and so it is not allowing the current tran=
saction to happen=20
For that particular index range.

7. When the index has been dropped and table has been recreated then everyt=
hing works fine.


Plz help me with this issue by providing answers to the following queries

A) why postgreSQL is not allowing the insertion to happen for a particular =
index range and allowing it after that particular range?
B) where does the post greSQL stores the transaction ids and why it is not =
getting refreshed ?


Regards
Mitu
-----Original Message-----
From: Mitu Verma=20
Sent: Wednesday, July 09, 2014 12:31 PM
To: 'Tom Lane'
Cc: pgsql-bugs@postgresql.org
Subject: RE: [BUGS] BUG #10888: application is getting hanged in the poll()=
 function of libpq.so.

Hi Tom,

I have reproduced the issue on our test bed.

When I run a single query to insert the data into table , then the query is=
 getting hanged.

 I debugged this issue by compiling the postgres code and using the databas=
e where the customer is facing problem. (see the gdb below) Then the coredu=
mp is coming with the assertion getting failed for transaction id as 0.

Following are my observations.

1. Even if I run a single query, (i.e no other transaction is in place) , t=
hen also following code is returning valid value of xwait. I checked this u=
sing gdb.(see the gdb below)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
File nbtinsert.c=20

Bool _bt_doinsert(Relation rel, IndexTuple itup,IndexUniqueCheck checkUniqu=
e, Relation heapRel) {
-----------------------------
-----------------------------
xwait =3D _bt_check_unique(rel, itup, heapRel, buf, offset, itup_scankey ch=
eckUnique, &is_unique);

 /* As per my understanding if I am running a single transaction then tis v=
alue should retun 0, but it seems that this value is being set some where I=
n postgres code.
-----------------------------------
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
now because of the valid value of this xwait, it assumes that there is some=
 other transaction which is already going on and so the current transaction=
 has to wait till the old transaction is either committed or rolled backed.
So the current transaction waits in infinite loop for the completion of the=
 other transaction( which is not there actually ).
So the current transaction just keeps on waiting forever.

2. I checked inside bt_check_unique and following is the code which is hit =
in our case and which suggests that there is an already existing transactio=
n XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXiXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

/* If this tuple is being updated by other transaction then we have to wait=
 for its commit/abort.*/ xwait =3D (TransactionIdIsValid(SnapshotDirty.xmin=
)) ?SnapshotDirty.xmin : SnapshotDirty.xmax;

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

3. This issue is coming only with that table where table index is defined.

4. Issue is resolved if the index is dropped and we recreate the table.

It looks to me that postgres stores the transaction (SnapshotDirty ?) somew=
here and even if there is no other transaction going on this is Showing som=
e abrupt value which is giving conflicts.

If anyone can help me with this piece of postgresql code then it will be re=
ally helpful.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
(gdb) where
#0  0x00000031c8232a45 in raise () from /lib64/libc.so.6
#1  0x00000031c8234225 in abort () from /lib64/libc.so.6
#2  0x000000000072248d in ExceptionalCondition (conditionName=3D<value opti=
mized out>, errorType=3D<value optimized out>, fileName=3D<value optimized =
out>, lineNumber=3D<value optimized out>)
    at assert.c:57
#3  0x000000000064ee48 in XactLockTableWait (xid=3D0) at lmgr.c:471
#4  0x00000000004837fb in _bt_doinsert (rel=3D0x7f616e9e3dc8, itup=3D0x221a=
b30, checkUnique=3DUNIQUE_CHECK_YES, heapRel=3D0x7f616e9df3a8) at nbtinsert=
.c:169
#5  0x0000000000487f21 in btinsert (fcinfo=3D<value optimized out>) at nbtr=
ee.c:261
#6  0x0000000000728c14 in FunctionCall6Coll (flinfo=3D<value optimized out>=
, collation=3D<value optimized out>, arg1=3D<value optimized out>, arg2=3D<=
value optimized out>, arg3=3D<value optimized out>,
    arg4=3D<value optimized out>, arg5=3D140056444400552, arg6=3D1) at fmgr=
.c:1439
#7  0x000000000047f746 in index_insert (indexRelation=3D0x7f616e9e3dc8, val=
ues=3D0x7fff36fd71f0, isnull=3D0x7fff36fd72f0 "", heap_t_ctid=3D0x221a9dc, =
heapRelation=3D0x7f616e9df3a8,
    checkUnique=3DUNIQUE_CHECK_YES) at indexam.c:215
#8  0x0000000000591c15 in ExecInsertIndexTuples (slot=3D0x2217918, tupleid=
=3D0x221a9dc, estate=3D0x2215a18) at execUtils.c:1086
#9  0x000000000059fab5 in ExecInsert (node=3D0x2219a40) at nodeModifyTable.=
c:247
#10 ExecModifyTable (node=3D0x2219a40) at nodeModifyTable.c:847
#11 0x00000000005878d8 in ExecProcNode (node=3D0x2219a40) at execProcnode.c=
:371
#12 0x0000000000586526 in ExecutePlan (queryDesc=3D0x2215608, direction=3D<=
value optimized out>, count=3D0) at execMain.c:1440
#13 standard_ExecutorRun (queryDesc=3D0x2215608, direction=3D<value optimiz=
ed out>, count=3D0) at execMain.c:314
#14 0x0000000000667abf in ProcessQuery (plan=3D0x21c5920,
    sourceText=3D0x21c3a08 "INSERT INTO eventLogEntry (object, method, bgwU=
ser, time, realUser, host, application, tableIndex ) VALUES ('Server', 'Sta=
rt', 'bgw', '20140512122404', NULL, NULL, NULL, 539);", params=3D0x0, dest=
=3D<value optimized out>, completionTag=3D0x7fff36fd76d0 "") at pquery.c:18=
7
#15 0x0000000000667d65 in PortalRunMulti (portal=3D0x21272e8, isTopLevel=3D=
1 '\001', dest=3D0x220f580, altdest=3D0x220f580, completionTag=3D0x7fff36fd=
76d0 "") at pquery.c:1276
#16 0x0000000000668592 in PortalRun (portal=3D0x21272e8, count=3D9223372036=
854775807, isTopLevel=3D1 '\001', dest=3D0x220f580, altdest=3D0x220f580, co=
mpletionTag=3D0x7fff36fd76d0 "") at pquery.c:813
#17 0x0000000000664949 in exec_simple_query (
    query_string=3D0x21c3a08 "INSERT INTO eventLogEntry (object, method, bg=
wUser, time, realUser, host, application, tableIndex ) VALUES ('Server', 'S=
tart', 'bgw', '20140512122404', NULL, NULL, NULL, 539);") at postgres.c:102=
0
#18 0x0000000000665880 in PostgresMain (argc=3D<value optimized out>, argv=
=3D<value optimized out>, username=3D<value optimized out>) at postgres.c:3=
968
#19 0x000000000061c691 in BackendRun () at postmaster.c:3617
#20 BackendStartup () at postmaster.c:3302
#21 ServerLoop () at postmaster.c:1466
#22 0x000000000061eff2 in PostmasterMain (argc=3D<value optimized out>, arg=
v=3D<value optimized out>) at postmaster.c:1127
#23 0x00000000005b8bd0 in main (argc=3D3, argv=3D0x2118e70) at main.c:199

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=
XXX
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 08, 2014 12:05 AM
To: Mitu Verma
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #10888: application is getting hanged in the poll()=
 function of libpq.so.

Mitu Verma <mitu.verma@ericsson.com> writes:
>  I am not sure if it is a postgres issue or not, so need your help in tha=
t.=20

Well, it looks like you've got two concurrent transactions trying to insert=
 the exact same data:

>  16384 | fm_db_CollectCCN |   20714 |    16391 | mmsuper |               =
   | ::1         |                 |       19540
> | 2014-07-07 19:10:09.295565+02 | 2014-07-07 19:10:09.311701+02 | 2014-07=
-07 19:10:09.311701+02 | f       | INSERT INTO ev
> entLogEntry (object, method, bgwUser, time, realUser, host, application, =
tableIndex  ) VALUES (E'Server', E'Start', E'bgw'
> , E'20140512122404', NULL, NULL, NULL, 539  )
>  16384 | fm_db_CollectCCN |   20819 |    16391 | mmsuper |               =
   | ::1         |                 |       19602
> | 2014-07-07 19:10:45.18688+02  | 2014-07-07 19:10:45.202903+02 | 2014-07=
-07 19:10:45.202903+02 | f       | INSERT INTO ev
> entLogEntry (object, method, bgwUser, time, realUser, host, application, =
tableIndex  ) VALUES (E'Server', E'Start', E'bgw'
> , E'20140512122404', NULL, NULL, NULL, 539  )

Presumably this is falling foul of a unique index on the table.
The second one is going to block until the first one either commits or roll=
s back.

            regards, tom lane