[SERVER-31120] Invariant failure remotesExhausted_inlock() || _lifecycleState == kKillComplete Created: 17/Sep/17  Updated: 30/Oct/23  Resolved: 20/Sep/17

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 3.2.19
Fix Version/s: 3.6.0-rc0

Type: Bug Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: Mira Carey
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongos.log    
Issue Links:
Backports
Related
is related to SERVER-31138 ClusterClientCursorImpl isn't a safe ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Sprint: Platforms 2017-10-02
Participants:
Case:

 Description   

Triggered by PyMongo's test suite, on my branch where I'm developing sessions. This is a mongos error from a sharded cluster with auth, when PyMongo is calling "getMore" on an aggregation cursor with "lsid":

2017-09-17T14:19:16.577-0400 F -        [conn187] Invariant failure remotesExhausted_inlock() || _lifecycleState == kKillComplete src/mongo/s/que
ry/async_results_merger.cpp 84
 
 mongos(_ZN5mongo15invariantFailedEPKcS1_j+0x2E6) [0x10e848a46]
 mongos(_ZN5mongo18AsyncResultsMergerD2Ev+0x196) [0x10dfed976]
 mongos(_ZN5mongo16RouterStageMergeD0Ev+0x1C) [0x10dfea69c]
 mongos(_ZN5mongo23ClusterClientCursorImplD0Ev+0xB6) [0x10dfe93c6]
 mongos(_ZN5mongo20ClusterCursorManager14checkOutCursorERKNS_15NamespaceStringExPNS_16OperationContextE+0x3D3) [0x10e1b0d93]
 mongos(_ZN5mongo11ClusterFind10runGetMoreEPNS_16OperationContextERKNS_14GetMoreRequestE+0x4A) [0x10dfe30ea]
 mongos(_ZN5mongo12_GLOBAL__N_117ClusterGetMoreCmd3runEPNS_16OperationContextERKNSt3__112basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEERKNS_7BSONObjERNS_14BSONObjBuilderE+0x116) [0x10df87ab6]
 mongos(_ZN5mongo12BasicCommand11enhancedRunEPNS_16OperationContextERKNS_12OpMsgRequestERNS_14BSONObjBuilderE+0x77) [0x10e2f0037]
 mongos(_ZN5mongo7Command9publicRunEPNS_16OperationContextERKNS_12OpMsgRequestERNS_14BSONObjBuilderE+0x20) [0x10e2ee530]
 mongos(_ZN5mongo12_GLOBAL__N_110runCommandEPNS_16OperationContextERKNS_12OpMsgRequestEONS_14BSONObjBuilderE+0xC8F) [0x10dfc6a7f]
 mongos(_ZN5mongo8Strategy13clientCommandEPNS_16OperationContextERKNS_7MessageE+0x341) [0x10dfc32d1]
 mongos(_ZN5mongo23ServiceEntryPointMongos13handleRequestEPNS_16OperationContextERKNS_7MessageE+0x2E5) [0x10df21c25]
 mongos(_ZN5mongo19ServiceStateMachine15_processMessageERNS0_11ThreadGuardE+0x18A) [0x10df2967a]
 mongos(_ZN5mongo19ServiceStateMachine15_runNextInGuardERNS0_11ThreadGuardE+0x175) [0x10df28b35]
 mongos(_ZN5mongo19ServiceStateMachine7runNextEv+0x38) [0x10df294a8]

Log attached. PyMongo was executing:

            # Use batchSize to ensure multiple getMore messages
            cursor = db.test.aggregate(
                [{'$project': {'_id': '$_id'}}],
                batchSize=5)
 
            self.assertEqual(
                expected_sum,
                sum(doc['_id'] for doc in cursor))



 Comments   
Comment by Githook User [ 20/Sep/17 ]

Author:

{'email': 'jcarey@argv.me', 'name': 'Jason Carey', 'username': 'hanumantmk'}

Message: SERVER-31120 fix invalid session getMore invariant

When passing the wrong lsid to a cursor (not the lsid used to create it)
we invariant in sharding. This appears to be about poor lifetime issues
in mongos cursors.

This papers over the bad api and adds a test for the fix.
Branch: master
https://github.com/mongodb/mongo/commit/b4fa6b5c46612b7943230dc1a4b24ce7867aa681

Comment by A. Jesse Jiryu Davis [ 19/Sep/17 ]

That's great! Very helpful for driver testing if the server uasserts when getMore doesn't have the right lsid.

I think the PyMongo code I was testing at the time did send a different lsid with getMore than with aggregate; that's fixed in my code now.

Comment by Mira Carey [ 18/Sep/17 ]

I was able to reproduce this by issuing a getMore with a different lsid than than the lsid used to create a cursor. It may also happen if a getMore is issued without an lsid for a cursor that was created with one.

The fix will clean that up (so that a helpful uassert shows up instead of an invariant)

Generated at Thu Feb 08 04:26:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.