[SERVER-32150] Only increment scanAndOrder once per logical operation Created: 01/Dec/17  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework, Diagnostics
Affects Version/s: 3.4.0, 3.6.0
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Participants:
Case:

 Description   

We will increment the scanAndOrder metric once per operation, which includes getMore operations. This makes for the following strange behaviors:

  • The number of times this metric is incremented depends on the batchSize used. A smaller batch size will mean more getMores, and the metric will be incremented once per getMore.
  • In a sharded aggregation which performs a blocking sort, this metric will be incremented at least twice, once for the 'cursor establishment' portion of the aggregate, sent with batchSize: 0, and once for the actual computation of the first batch, which happens in a getMore.

Original Description

Currently each operation that reports to have a sort stage will increment the global serverStatus counter "metrics.operation.scanAndOrder". When an aggregation is issued against a sharded cluster, this number can be incremented twice, once for the shards half of the pipeline, and once for the merging half (see also SERVER-32014).

This arguably makes sense, since from the mongod's perspective there are two operations performing a blocking sort. On the other hand there was only one operation that performed a blocking sort from the client's perspective. For one last perspective, we did technically scan and order documents twice for that operation. If this metric is tracking the number of times the server performed a blocking sort, then we should increment this once per blocking $sort stage, not once per operation.



 Comments   
Comment by Charlie Swanson [ 15/Dec/17 ]

Oh - yea I found an easier way to test my above hypothesis, if I play with the batch size I can force more getMores to happen. Looks like this is just being incremented on each getMore, but a max of once per operation. Note this is tested on 3.6, which for operations like the following without allowDiskUse will perform the merge on mongos, so there is no merging operation happening on either shard.

// Before any operation:
mongo localhost:20000 --quiet --eval 'print("scanAndOrder = " + db.serverStatus().metrics.operation.scanAndOrder + " (shard0000)")'; mongo localhost:20001 --quiet --eval 'print("scanAndOrder = " + db.serverStatus().metrics.operation.scanAndOrder + " (shard0001)")'
scanAndOrder = 8 (shard0000)
scanAndOrder = 8 (shard0001)
// Use a smaller batch size to force multiple batches.
db.sharded.aggregate([{$match: {A: {$in: [343, 467, 64903, 45698]}}}, {$sort: {C: 1}}], {cursor: {batchSize: 2}})
{ "_id" : 1324, "A" : 343, "B" : 3253, "C" : 3134, "D" : 4143 }
{ "_id" : 459, "A" : 64903, "B" : 6049, "C" : 4096, "D" : 1345 }
{ "_id" : 698, "A" : 45698, "B" : 589, "C" : 4981, "D" : 20956 }
{ "_id" : 1212, "A" : 467, "B" : 305, "C" : 6908243, "D" : 394058 }
...
mongo localhost:20000 --quiet --eval 'print("scanAndOrder = " + db.serverStatus().metrics.operation.scanAndOrder + " (shard0000)")'; mongo localhost:20001 --quiet --eval 'print("scanAndOrder = " + db.serverStatus().metrics.operation.scanAndOrder + " (shard0001)")'
scanAndOrder = 11 (shard0000)  // Up by 3, this shard must have had more than one batch's worth of documents (>2)
scanAndOrder = 10 (shard0001)  // This went up by 2, presumably because the initial aggregate simply opened a cursor, and the getMore returned the 1 matching document.

Compared to without specifying batch size:

// Before operation:
mongo localhost:20000 --quiet --eval 'print("scanAndOrder = " + db.serverStatus().metrics.operation.scanAndOrder + " (shard0000)")'; mongo localhost:20001 --quiet --eval 'print("scanAndOrder = " + db.serverStatus().metrics.operation.scanAndOrder + " (shard0001)")'
scanAndOrder = 11 (shard0000)
scanAndOrder = 10 (shard0001)
// No batch size, everything should fit in first batch.
mongos> db.sharded.aggregate([{$match: {A: {$in: [343, 467, 64903, 45698]}}}, {$sort: {C: 1}}])
{ "_id" : 1324, "A" : 343, "B" : 3253, "C" : 3134, "D" : 4143 }
{ "_id" : 459, "A" : 64903, "B" : 6049, "C" : 4096, "D" : 1345 }
{ "_id" : 698, "A" : 45698, "B" : 589, "C" : 4981, "D" : 20956 }
{ "_id" : 1212, "A" : 467, "B" : 305, "C" : 6908243, "D" : 394058 }
...
mongo localhost:20000 --quiet --eval 'print("scanAndOrder = " + db.serverStatus().metrics.operation.scanAndOrder + " (shard0000)")'; mongo localhost:20001 --quiet --eval 'print("scanAndOrder = " + db.serverStatus().metrics.operation.scanAndOrder + " (shard0001)")'
scanAndOrder = 13 (shard0000)  // Each increments by 2.
scanAndOrder = 12 (shard0001)

I wouldn't really consider this conclusive evidence, but combined with my above comment and the priority of 'Backlog', I'm going to stop there. We can reconsider when this ticket is scheduled in the future.

Comment by Charlie Swanson [ 15/Dec/17 ]

So Akira's reproduction script definitely is accurate, it looks like we're incrementing it twice for the aggregation in that example. I can't figure out why though. It looks like the only place we ever touch the scanAndOrderCounter is in recordCurOpMetrics(), and that will only do it one time if the operation reports to have a sort stage at all, we don't ask how many sorts were performed by the operation (maybe we should?). In any case, it looks like we're somehow calling recordCurOpMetrics() multiple times, which is strange because it looks like the only callers of that are from within write commands and right before we return the response from ServiceEntryPointMongod::handleRequest(). I'm investigating how and why this is called twice, but I'm gonna have to dig a little bit and set up a debug build with a breakpoint. Will post back if I find anything. My initial hypothesis is that it's being incremented once on the cursor establishment, and again on the getMore.

Comment by Asya Kamsky [ 04/Dec/17 ]

From user POV, it seems like most important is to not ever increment when sort is satisfied by an index, or is a simple merge.

Second in importance is not over-counting actual in-memory sorts that should be counted.

Comment by Charlie Swanson [ 01/Dec/17 ]

david.storch I put this in "Needs Triage" so that we can discuss what we believe the correct answer to be. We discussed earlier that the number of times this counter increments is not as important as whether it increments at all, so we could also just move this to the backlog and re-visit later.

Generated at Thu Feb 08 04:29:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.