[SERVER-29807] RangeDeleter should log when its about to wait for majority replication Created: 23/Jun/17  Updated: 30/Oct/23  Resolved: 02/Mar/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.14, 3.4.4
Fix Version/s: 3.6.4, 3.7.3

Type: Improvement Priority: Major - P3
Reporter: Kevin Pulo Assignee: Kevin Pulo
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-29812 RangeDeleter unnecessarily waits for ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.6, v3.4
Sprint: Sharding 2017-12-04, Sharding 2018-01-01, Sharding 2017-12-18, Sharding 2018-02-26, Sharding 2018-03-12
Participants:

 Description   

The RangeDeleter not only deletes the necessary documents, but also waits for a majority of members to be caught up. In 3.4, this happens all the time, whereas in master, this happens only if any documents were deleted.

In 3.4, the w:majority wait happens after the (apparently final) "rangeDeleter deleted X documents" message, whereas in master, it happens before the final "Deleted X documents in DB.COLL range ..." message.

This ticket is to request log message(s) (at default level) to indicate when the RangeDeleter is spending time waiting for replication.



 Comments   
Comment by Githook User [ 19/Mar/18 ]

Author:

{'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo', 'username': 'devkev'}

Message: SERVER-29807 Make CollectionRangeDeleter wait for w:majority after each range

That is, wait once after the range has finished being deleted - rather than
waiting after each batch of ~100 document deletions (while yielding).

Also log before and after waiting for majority replication.

(cherry picked from commit a945fb4697af6e2360ddcdd4eee2b73a0f42469f)
Branch: v3.6
https://github.com/mongodb/mongo/commit/bd0208763e5196bbf0514d58e8351b64a15e84f4

Comment by Githook User [ 02/Mar/18 ]

Author:

{'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo', 'username': 'devkev'}

Message: SERVER-29807 Make CollectionRangeDeleter wait for w:majority after each range

That is, wait once after the range has finished being deleted - rather than
waiting after each batch of ~100 document deletions (while yielding).

Also log before and after waiting for majority replication.
Branch: master
https://github.com/mongodb/mongo/commit/a945fb4697af6e2360ddcdd4eee2b73a0f42469f

Comment by Kevin Pulo [ 21/Feb/18 ]

Yes, SERVER-29812 obviates the need for this in 3.4 — but it could still be useful to add the corresponding logging in master.

Comment by Kaloian Manassiev [ 02/Nov/17 ]

kevin.pulo, I think that if SERVER-29812 is fixed, then we won't need to do this logging. I am assigning it to you to look at when you are working on that ticket.

Comment by Kevin Pulo [ 23/Jun/17 ]

For example, in 3.4 something like:

db/range_deleter.cpp

 bool RangeDeleter::deleteNow(OperationContext* txn,
                              const RangeDeleterOptions& options,
                              string* errMsg) {
 
     // ...
 
     taskDetails.stats.deleteStartTS = jsTime();
     bool result = _env->deleteRange(txn, taskDetails, &taskDetails.stats.deletedDocCount, errMsg);
 
     taskDetails.stats.deleteEndTS = jsTime();
 
     if (result) {
         taskDetails.stats.waitForReplStartTS = jsTime();
+        log() << "rangeDeleter waiting for majority replication of "
+              << taskDetails.stats.deletedDocCount << " deleted documents for "
+              << taskDetails.options.range.ns << " from "
+              << redact(taskDetails.options.range.minKey) << " -> "
+              << redact(taskDetails.options.range.maxKey);
         result = _waitForMajority(txn, errMsg);
+        log() << "rangeDeleter finished waiting for majority replication";
         taskDetails.stats.waitForReplEndTS = jsTime();
     }
 
     // ...

Comment by Kevin Pulo [ 23/Jun/17 ]

For example, in master something like:

db/s/collection_range_deleter.cpp

 auto CollectionRangeDeleter::cleanUpNextRange(OperationContext* opCtx,
                                               NamespaceString const& nss,
                                               Action action,
                                               int maxToDelete,
                                               CollectionRangeDeleter* forTestOnly) -> Action {
 
             // ...
 
             try {
                 auto keyPattern = scopedCollectionMetadata->getKeyPattern();
 
                 wrote = self->_doDeletion(opCtx, collection, keyPattern, *range, maxToDelete);
             } catch (const DBException& e) {
                 wrote = e.toStatus();
                 warning() << e.what();
             }
             if (!wrote.isOK() || wrote.getValue() == 0) {
                 if (wrote.isOK()) {
                     log() << "No documents remain to delete in " << nss << " range "
                           << redact(range->toString());
                 }
                 stdx::lock_guard<stdx::mutex> scopedLock(css->_metadataManager->_managerLock);
                 self->_pop(wrote.getStatus());
                 return Action::kWriteOpLog;
             }
         }  // drop scopedCollectionMetadata
     }      // drop autoColl
 
     invariant(range);
     invariantOK(wrote.getStatus());
     invariant(wrote.getValue() > 0);
 
+    log() << "Waiting for majority replication of " << wrote.getValue() << " local deletions in "
+          << nss.ns() << " range " << redact(range->toString());
+
     repl::ReplClientInfo::forClient(opCtx->getClient()).setLastOpToSystemLastOpTime(opCtx);
     const auto clientOpTime = repl::ReplClientInfo::forClient(opCtx->getClient()).getLastOp();
 
     // Wait for replication outside the lock
     WriteConcernResult unusedWCResult;
     Status status = Status::OK();
     try {
         status = waitForWriteConcern(opCtx, clientOpTime, kMajorityWriteConcern, &unusedWCResult);
     } catch (const DBException& e) {
         status = e.toStatus();
     }
     if (!status.isOK()) {
         log() << "Error when waiting for write concern after removing " << nss << " range "
               << redact(range->toString()) << " : " << redact(status.reason());
 
         AutoGetCollection autoColl(opCtx, nss, MODE_IX);
         auto* css = CollectionShardingState::get(opCtx, nss);
         stdx::lock_guard<stdx::mutex> scopedLock(css->_metadataManager->_managerLock);
         auto* self = &css->_metadataManager->_rangesToClean;
         // if range were already popped (e.g. by dropping nss during the waitForWriteConcern above)
         // its notification would have been triggered, so this check suffices to ensure that it is
         // safe to pop the range here.
         if (!notification.ready()) {
             invariant(!self->isEmpty() && self->_orphans.front().notification == notification);
             log() << "Abandoning deletion of latest range in " << nss.ns() << " after "
                   << wrote.getValue() << " local deletions because of replication failure";
             self->_pop(status);
         }
     } else {
-        log() << "Deleted " << wrote.getValue() << " documents in " << nss.ns() << " range "
+        log() << "Finished deleting " << wrote.getValue() << " documents in " << nss.ns() << " range "
               << redact(range->toString());
     }
 
     notification.abandon();
     return Action::kMore;
 }

Generated at Thu Feb 08 04:21:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.