[SERVER-5999] page fault exception in ClientCursor::aboutToDelete() can leave another Cursor / ClientCursor in an inconsistent state Created: 04/Jun/12  Updated: 11/Jul/16  Resolved: 07/Jun/12

Status: Closed
Project: Core Server
Component/s: Concurrency
Affects Version/s: None
Fix Version/s: 2.1.2

Type: Bug Priority: Major - P3
Reporter: Aaron Staple Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-5970 QueryOptimizerCursorTests core test f... Closed
Related
related to SERVER-6052 Investigate killop in ClientCursor::a... Closed
Operating System: ALL
Participants:

 Description   

Before a document is deleted, all cursors pointing at it are advanced until they do not point at it:

            c->recoverFromYield();
            DiskLoc tmp1 = c->refLoc();
            if ( tmp1 != dl ) {
                // This might indicate a failure to call ClientCursor::prepareToYield() but it can
                // also happen during correct operation, see SERVER-2009.
                problem() << "warning: cursor loc " << tmp1 << " does not match byLoc position " << dl << " !" << endl;
            }
            else {
                c->advance();
            }
            while (!c->eof() && c->refLoc() == dl) {
                /* We don't delete at EOF because we want to return "no more results" rather than "no such cursor".
                 * The loop is to handle MultiKey indexes where the deleted record is pointed to by multiple adjacent keys.
                 * In that case we need to advance until we get to the next distinct record or EOF.
                 * SERVER-4154
                 * SERVER-5198
                 * But see SERVER-5725.
                 */
                c->advance();
            }
            cc->updateLocation();

Some of the functions above may generate page fault exceptions for certain types of Cursors. For example an inconsistency can arise if:

  • A cursor is advanced but an exception is thrown before cc->updateLocation() is called, causing the cursor's _lastLoc to be inconsistent with its actual location.
  • A page fault exception is thrown within advance(), leaving the cursor in an inconsistent state.

The impact of these issues is somewhat mitigated by the fact that a non capped BasicCursor tends to perform few data accesses, and that a BtreeCursor probably does not need to be handled in aboutToDelete() anyway (see SERVER-5725). I have not checked all cases with these cursor types, and with the potential for page fault exceptions in this code there are a lot of cases to check. For BtreeCursor there at least seems to be a problematic case in noteLocation() where page fault exceptions are simulated in debug mode and keyAtKeyOfs is updated but locAtKeyOfs is not updated due to a page fault exception. This could cause the cursor to skip matching documents.

I am attaching a test case for one issue where a MultiCusor is left in an inconsistent state due to a page fault exception in advance(). I believe there is also a case where a MultiCusor can be left pointing to a bad document, but it is difficult to write a test for that case.

This test will fail sporadically on debug builds, where a page fault is simulated probabilistically:

 
diff --git a/src/mongo/dbtests/cursortests.cpp b/src/mongo/dbtests/cursortests.cpp
index 38d1197..738f62b 100644
--- a/src/mongo/dbtests/cursortests.cpp
+++ b/src/mongo/dbtests/cursortests.cpp
@@ -298,7 +298,51 @@ namespace CursorTests {
                 client.dropCollection( ns() );
             }
         };        
-        
+
+        /**
+         * A cursor is advanced when the document at its current iterate is removed.
+         */
+        class HandleDelete : public Base {
+        public:
+            void run() {
+                for( int i = 0; i < 150; ++i ) {
+                    client.insert( ns(), BSON( "_id" << i ) );
+                }
+
+                boost::shared_ptr<Cursor> cursor;
+                ClientCursor::Holder clientCursor;
+                ClientCursor::YieldData yieldData;
+
+                {
+                    Client::ReadContext ctx( ns() );
+                    // The query will utilize the _id index for both the first and second clauses.
+                    cursor = NamespaceDetailsTransient::getCursor
+                            ( ns(), fromjson( "{$or:[{_id:{$gte:0,$lte:148}},{_id:149}]}" ) );
+                    clientCursor.reset( new ClientCursor( QueryOption_NoCursorTimeout, cursor,
+                                                          ns() ) );
+                    // Advance to the last iterate of the first clause.
+                    ASSERT( cursor->ok() );
+                    while( cursor->current()[ "_id" ].number() != 148 ) {
+                        ASSERT( cursor->advance() );
+                    }
+                    clientCursor->prepareToYield( yieldData );
+                }
+
+                // Remove the document at the cursor's current position, which will cause the
+                // cursor to be advanced.
+                client.remove( ns(), BSON( "_id" << 148 ) );
+
+                {
+                    Client::ReadContext ctx( ns() );
+                    clientCursor->recoverFromYield( yieldData );
+                    // Verify that the cursor has another iterate, _id:149, after it is advanced due
+                    // to _id:148's removal.
+                    ASSERT( cursor->ok() );
+                    ASSERT_EQUALS( 149, cursor->current()[ "_id" ].number() );
+                }
+            }
+        };
+
         /**
          * ClientCursor::aboutToDelete() advances a ClientCursor with a refLoc() matching the
          * document to be deleted.
@@ -470,6 +514,7 @@ namespace CursorTests {
             add<BtreeCursor::RangeEq>();
             add<BtreeCursor::RangeIn>();
             add<BtreeCursor::AbortImplicitScan>();
+            add<ClientCursor::HandleDelete>();
             add<ClientCursor::AboutToDelete>();
             add<ClientCursor::AboutToDeleteDuplicate>();
             add<ClientCursor::AboutToDeleteDuplicateNextClause>();



 Comments   
Comment by auto [ 07/Jun/12 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: test for SERVER-5999
Branch: master
https://github.com/mongodb/mongo/commit/94f51ccf64f73e7bc711c1ab8f9d0ad062c2e006

Comment by auto [ 07/Jun/12 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-5999 turn off page fault exception in ClientCursor::aboutToDelete
Branch: master
https://github.com/mongodb/mongo/commit/9e7c382aef5b696fe425a07ba7635e2051e85479

Comment by auto [ 07/Jun/12 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-5999 option to turn of page fault exception in critical sections
Branch: master
https://github.com/mongodb/mongo/commit/10c508566d0b90482644d23f1e37769f3537597b

Generated at Thu Feb 08 03:10:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.