[SERVER-31255] certain sequence of collection and index operations can cause dropIndex rollback to fail. Created: 25/Sep/17  Updated: 30/Oct/23  Resolved: 13/Oct/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.5.13
Fix Version/s: 3.6.0-rc1

Type: Bug Priority: Major - P3
Reporter: Robert Guo (Inactive) Assignee: Katherine Walker (Inactive)
Resolution: Fixed Votes: 0
Labels: rbfz
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File index_builder.js    
Issue Links:
Related
related to SERVER-31062 Add more detailed diagnostics for rol... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2017-10-02, Repl 2017-10-23
Participants:

 Description   

Certain sequences of collection and index operations can cause a dropIndex to be rolled back as an index creation on a collection that doesn't yet exist, causing rollback to fassert.

Please see the attached file for a repro.

est:index_builder] 2017-09-25T19:07:02.443-0400 d20013| 2017-09-25T19:07:02.443-0400 I ROLLBACK [rsBackgroundSync] Finding the Common Point
[js_test:index_builder] 2017-09-25T19:07:02.444-0400 d20013| 2017-09-25T19:07:02.444-0400 I ROLLBACK [rsBackgroundSync] our last optime:   Sep 25 19:06:46:9
[js_test:index_builder] 2017-09-25T19:07:02.444-0400 d20013| 2017-09-25T19:07:02.444-0400 I ROLLBACK [rsBackgroundSync] their last optime: Sep 25 19:06:57:2
[js_test:index_builder] 2017-09-25T19:07:02.444-0400 d20013| 2017-09-25T19:07:02.444-0400 I ROLLBACK [rsBackgroundSync] diff in end of log times: -11 seconds
[js_test:index_builder] 2017-09-25T19:07:02.445-0400 d20013| 2017-09-25T19:07:02.445-0400 I ROLLBACK [rsBackgroundSync] Rollback common point is { ts: Timestamp 1506380800000|8, t: 1 }
[js_test:index_builder] 2017-09-25T19:07:02.445-0400 d20013| 2017-09-25T19:07:02.445-0400 I ROLLBACK [rsBackgroundSync] Starting refetching documents
[js_test:index_builder] 2017-09-25T19:07:02.445-0400 d20013| 2017-09-25T19:07:02.445-0400 I ROLLBACK [rsBackgroundSync] Finished refetching documents. Total size of documents refetched: 0
[js_test:index_builder] 2017-09-25T19:07:02.445-0400 d20013| 2017-09-25T19:07:02.445-0400 I ROLLBACK [rsBackgroundSync] Checking the RollbackID and updating the MinValid if necessary
[js_test:index_builder] 2017-09-25T19:07:02.446-0400 d20013| 2017-09-25T19:07:02.446-0400 I ROLLBACK [rsBackgroundSync] Setting minvalid to { ts: Timestamp 1506380817000|2, t: 2 }
[js_test:index_builder] 2017-09-25T19:07:02.447-0400 d20013| 2017-09-25T19:07:02.446-0400 I ROLLBACK [rsBackgroundSync] Dropping collections to roll back create operations
[js_test:index_builder] 2017-09-25T19:07:02.457-0400 d20013| 2017-09-25T19:07:02.456-0400 I ROLLBACK [rsBackgroundSync] Dropped collection with UUID: c2a33d6e-4b99-4729-8604-02233cb2b409 and nss: test.system.drop.1506380806i9t1.two
[js_test:index_builder] 2017-09-25T19:07:02.457-0400 d20013| 2017-09-25T19:07:02.456-0400 I ROLLBACK [rsBackgroundSync] Rolling back renameCollection commands and collection drop commands.
[js_test:index_builder] 2017-09-25T19:07:02.462-0400 d20013| 2017-09-25T19:07:02.461-0400 I ROLLBACK [rsBackgroundSync] Renamed collection from test.system.drop.1506380806i4t1.twoto test.three with uuid: f3b6102f-c359-4652-916d-66c45ef1d8a0
[js_test:index_builder] 2017-09-25T19:07:02.462-0400 d20013| 2017-09-25T19:07:02.461-0400 I ROLLBACK [rsBackgroundSync] Rolling back collections pending being dropped: Removing them from the list of drop-pending collections in the DropPendingCollectionReaper.
[js_test:index_builder] 2017-09-25T19:07:02.462-0400 d20013| 2017-09-25T19:07:02.462-0400 I REPL     [rsBackgroundSync] Rolling back collection drop for test.system.drop.1506380806i4t1.two with drop OpTime { ts: Timestamp 1506380806000|4, t: 1 } to namespace test.two
[js_test:index_builder] 2017-09-25T19:07:02.462-0400 d20013| 2017-09-25T19:07:02.462-0400 I REPL     [rsBackgroundSync] Rolling back collection drop for test.system.drop.1506380806i9t1.two with drop OpTime { ts: Timestamp 1506380806000|9, t: 1 } to namespace test.two
[js_test:index_builder] 2017-09-25T19:07:02.462-0400 d20013| 2017-09-25T19:07:02.462-0400 I ROLLBACK [rsBackgroundSync] Resyncing collection metadata, uuid: f3b6102f-c359-4652-916d-66c45ef1d8a0
[js_test:index_builder] 2017-09-25T19:07:02.463-0400 d20013| 2017-09-25T19:07:02.463-0400 I ROLLBACK [rsBackgroundSync] Rechecking the Rollback ID and minValid
[js_test:index_builder] 2017-09-25T19:07:02.464-0400 d20013| 2017-09-25T19:07:02.464-0400 I ROLLBACK [rsBackgroundSync] Setting minvalid to { ts: Timestamp 1506380817000|2, t: 2 }
[js_test:index_builder] 2017-09-25T19:07:02.465-0400 d20013| 2017-09-25T19:07:02.464-0400 I ROLLBACK [rsBackgroundSync] Rolling back createIndexes commands.
[js_test:index_builder] 2017-09-25T19:07:02.465-0400 d20013| 2017-09-25T19:07:02.464-0400 I ROLLBACK [rsBackgroundSync] Rolling back dropIndexes commands.
[js_test:index_builder] 2017-09-25T19:07:02.465-0400 d20013| 2017-09-25T19:07:02.464-0400 F -        [rsBackgroundSync] Fatal Assertion 40409 at src/mongo/db/index_builder.cpp 131
[js_test:index_builder] 2017-09-25T19:07:02.465-0400 d20013| 2017-09-25T19:07:02.465-0400 F -        [rsBackgroundSync]
[js_test:index_builder] 2017-09-25T19:07:02.465-0400 d20013|
[js_test:index_builder] 2017-09-25T19:07:02.465-0400 d20013| ***aborting after fassert() failure

stack trace:

mongo::fassertFailedWithLocation(int, char const*, unsigned int)
mongo::fassertWithLocation(int, bool, char const*, unsigned int)
mongo::IndexBuilder::_build(mongo::OperationContext*, mongo::Database*, bool, mongo::Lock::DBLock*) const
mongo::IndexBuilder::buildInForeground(mongo::OperationContext*, mongo::Database*) const
mongo::repl::createIndexForApplyOps(mongo::OperationContext*, mongo::BSONObj const&, mongo::NamespaceString const&, std::__1::function<void ()>)
mongo::repl::(anonymous namespace)::rollbackDropIndexes(mongo::OperationContext*, mongo::UUID, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mongo::BSONObj, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, mongo::BSONObj> > >)
mongo::repl::rollback_internal::syncFixUp(mongo::OperationContext*, mongo::repl::rollback_internal::FixUpInfo const&, mongo::repl::RollbackSource const&, mongo::repl::ReplicationCoordinator*, mongo::repl::ReplicationProcess*)
mongo::repl::(anonymous namespace)::_syncRollback(mongo::OperationContext*, mongo::repl::OplogInterface const&, mongo::repl::RollbackSource const&, int, mongo::repl::ReplicationCoordinator*, mongo::repl::ReplicationProcess*)
mongo::repl::syncRollback(mongo::OperationContext*, mongo::repl::OplogInterface const&, mongo::repl::RollbackSource const&, int, mongo::repl::ReplicationCoordinator*, mongo::repl::ReplicationProcess*)
mongo::repl::rollback(mongo::OperationContext*, mongo::repl::OplogInterface const&, mongo::repl::RollbackSource const&, int, mongo::repl::ReplicationCoordinator*, mongo::repl::ReplicationProcess*, std::__1::function<void (int)>)
mongo::repl::BackgroundSync::_fallBackOnRollbackViaRefetch(mongo::OperationContext*, mongo::HostAndPort const&, int, mongo::repl::OplogInterface*, bool, std::__1::function<mongo::DBClientBase* ()>)
mongo::repl::BackgroundSync::_runRollback(mongo::OperationContext*, mongo::Status const&, mongo::HostAndPort const&, int, mongo::repl::StorageInterface*)
mongo::repl::BackgroundSync::_produce(mongo::OperationContext*)
mongo::repl::BackgroundSync::_runProducer()



 Comments   
Comment by Githook User [ 13/Oct/17 ]

Author:

{'email': 'katherine.walker@mongodb.com', 'name': 'Katherine Walker', 'username': 'kvwalker'}

Message: SERVER-31255 Prevent dropIndex rollback from failing
Branch: master
https://github.com/mongodb/mongo/commit/991293423d3a252948957504ad91adbc33bdac82

Comment by Katherine Walker (Inactive) [ 29/Sep/17 ]

I made the indexSpec change and now rollback does not fassert, but the test fails with a dbhash mismatch between primary and secondary.

[js_test:index_builder] 2017-09-29T12:11:33.046-0400 Dumping the latest 100 documents that match { } from the oplog oplog.rs of MacBook-Pro-22.local:20012
[js_test:index_builder] 2017-09-29T12:11:33.046-0400 assert failed : dbhash mismatch between primary and secondary
[js_test:index_builder] 2017-09-29T12:11:33.046-0400 doassert@src/mongo/shell/assert.js:18:14
[js_test:index_builder] 2017-09-29T12:11:33.047-0400 assert@src/mongo/shell/assert.js:35:5
[js_test:index_builder] 2017-09-29T12:11:33.047-0400 checkDBHashesForReplSet@src/mongo/shell/replsettest.js:1443:13
[js_test:index_builder] 2017-09-29T12:11:33.047-0400 ReplSetTest/this.checkReplicaSet@src/mongo/shell/replsettest.js:1198:13
[js_test:index_builder] 2017-09-29T12:11:33.047-0400 ReplSetTest/this.checkReplicatedDataHashes@src/mongo/shell/replsettest.js:1446:9
[js_test:index_builder] 2017-09-29T12:11:33.047-0400 RollbackTest/this.stop@jstests/replsets/libs/rollback_test.js:174:9
[js_test:index_builder] 2017-09-29T12:11:33.047-0400 @jstests/replsets/index_builder.js:89:5
[js_test:index_builder] 2017-09-29T12:11:33.047-0400 @jstests/replsets/index_builder.js:88:1
[js_test:index_builder] 2017-09-29T12:11:33.047-0400
[js_test:index_builder] 2017-09-29T12:11:33.047-0400 d20013| 2017-09-29T12:11:33.046-0400 I COMMAND  [conn36] command: unlock requested
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 d20013| 2017-09-29T12:11:33.046-0400 I COMMAND  [conn36] fsyncUnlock completed. mongod is now unlocked and free to accept writes
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 2017-09-29T12:11:33.046-0400 E QUERY    [thread1] Error: assert failed : dbhash mismatch between primary and secondary :
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 doassert@src/mongo/shell/assert.js:18:14
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 assert@src/mongo/shell/assert.js:35:5
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 checkDBHashesForReplSet@src/mongo/shell/replsettest.js:1443:13
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 ReplSetTest/this.checkReplicaSet@src/mongo/shell/replsettest.js:1198:13
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 ReplSetTest/this.checkReplicatedDataHashes@src/mongo/shell/replsettest.js:1446:9
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 RollbackTest/this.stop@jstests/replsets/libs/rollback_test.js:174:9
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 @jstests/replsets/index_builder.js:89:5
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 @jstests/replsets/index_builder.js:88:1
[js_test:index_builder] 2017-09-29T12:11:33.048-0400 failed to load: jstests/replsets/index_builder.js

Comment by Judah Schvimer [ 26/Sep/17 ]

The bug is that BSONObj::addField adds the field to a copy and does not mutate the original BSONObj. This line should be indexSpec = indexSpec.addField(updatedNssObj.firstElement()); instead of simply indexSpec.addField(updatedNssObj.firstElement());

Comment by Judah Schvimer [ 26/Sep/17 ]

The problem is that the "ns" field in the index spec does not match the namespace of the collection due to the rename.

[js_test:server-31255] 2017-09-26T10:02:27.968-0400 d20013| 2017-09-26T10:02:27.967-0400 I ROLLBACK [rsBackgroundSync] Creating index in rollback: collection = test.three, index = ☃_1 spec { v: 2, key: { ￿: 1.0 }, name: "☃_1", ns: "test.two", sparse: true, expireAfterSeconds: 590.0 }
[js_test:server-31255] 2017-09-26T10:02:27.968-0400 d20013| 2017-09-26T10:02:27.967-0400 F -        [rsBackgroundSync] Fatal Assertion 40409 at src/mongo/db/index_builder.cpp 131

Generated at Thu Feb 08 04:26:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.