[SERVER-4617] make _pinValue error message when killCursors called during getMore better Created: 04/Jan/12  Updated: 11/Jul/16  Resolved: 01/Mar/12

Status: Closed
Project: Core Server
Component/s: Usability
Affects Version/s: 2.0.2
Fix Version/s: 2.1.1

Type: Bug Priority: Major - P3
Reporter: Aaron Staple Assignee: Aaron Staple
Resolution: Done Votes: 0
Labels: pv1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File pinval_repro.php     File start_test_cluster.sh    
Operating System: ALL
Participants:

 Description   

The following assertion has cropped up in several cases:

Sat Sep 24 17:51:53 [conn18] Assertion failure cc->_pinValue < 100
db/clientcursor.h 309
0x10007d703 0x100087eec 0x1002d64da 0x1003de782 0x1003e272a
0x100603394 0x100148d8c 0x10060a554 0x7fff830d18bf 0x7fff830d4b75
0 mongod 0x000000010007d703
_ZN5mongo12sayDbContextEPKc + 195
1 mongod 0x0000000100087eec
_ZN5mongo8assertedEPKcS1_j + 300
2 mongod 0x00000001002d64da
_ZN5mongo12ClientCursor5eraseEiPx + 938
3 mongod 0x00000001003de782
_ZN5mongo19receivedKillCursorsERNS_7MessageE + 274
4 mongod 0x00000001003e272a
_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE
+ 794
5 mongod 0x0000000100603394
_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE
+ 148
6 mongod 0x0000000100148d8c
_ZN5mongo3pms9threadRunEPNS_13MessagingPortE + 700
7 mongod 0x000000010060a554
thread_proxy + 132
8 libsystem_c.dylib 0x00007fff830d18bf
_pthread_start + 335
9 libsystem_c.dylib 0x00007fff830d4b75
thread_start + 13
Sat Sep 24 17:51:53 [conn18] killcursors exception: assertion db/
clientcursor.h:309 321ms

FREE-3022
FREE-3393
CS-1730
CS-1938
probably more, in the logs

Simplest explanation is that a cursor is being killed while in use by a get more (the get more may or may not be yielding). We might want to make the assertion a little more readable.

Also, three of the four cases above seemed to occur with the php driver and in the remaining case I couldn't tell which driver was used. So the cause could be a client side issue, either in the driver or the way the driver is being used.



 Comments   
Comment by auto [ 01/Mar/12 ]

Author:

{u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}

Message: SERVER-4617 more descriptive error message on attempt to kill a cursor while in use
Branch: master
https://github.com/mongodb/mongo/commit/cdf7e46a11fda538cb74b8abf7a2bb1a751c2626

Comment by Aaron Staple [ 16/Feb/12 ]

@eliot - Would it make sense to change the mongod error message to be more readable and not have a stack trace?

Comment by Eliot Horowitz (Inactive) [ 16/Feb/12 ]

This is actually a driver bug.

Comment by Andreas Bernhardsson [ 15/Feb/12 ]

Ok I see, I've posted some more info in SERVER-4649 I think that one is more in line with my issues.
There is really no reason for the server to start going slowly, it's extremely oversized and is only used for MongoDB, and I can't really find anything else in the oplog that could explain the problem.

Comment by Kristina Chodorow (Inactive) [ 15/Feb/12 ]

The _pinValue assertion is caused by a driver bug that is triggered when the server is slow. There have already been bugs filed against the PHP and Python drivers about this assertion. However, the assertion does not cause the server to slow down, it is just a symptom of server slowness. Do you have any monitoring that indicates why your servers are responding slowly?

Comment by Andreas Bernhardsson [ 13/Feb/12 ]

I'm also experiencing this issue, it usually happens during high peak times and I have to restart the primary DB for the problem to go away. I'm using 2.0.2 (no sharding) and the 1.2.7 PHP driver, I even tried to switch to 1.2.8dev to see if the PHP-313 fix would solve the issue but the driver caused the server to become terrible slow and stop replicating.

Is there any workaround to avoid this issue until this bug is resolved?

Comment by Ben Becker [ 03/Feb/12 ]

Steps to reproduce:

Set the max open fds to a low number (e.g. ulimit -n 256).

Start a sharded cluster w/ replication (see start_test_cluster.sh).

Enable replication and sharding on ns 'shardtest.shardcol'.

Run the attached pinval_repro.php script about 300 times, e.g.:

#!/bin/bash
for i in {1..300}
do
    php /path/to/php_test.php &
    sleep 1
done

The PHP script simply insert()s, find()s, and exhausts the cursor. Exceptions are ignored and reconnection is attempted. v2.0.2 will generally exhibit the _pinValue bug within one or two minutes, and the mongos instances will crash (likely due to SERVER-4699).

Generated at Thu Feb 08 03:06:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.