[SERVER-84358] Investigate whether inserts in a batch insert are processed serially when sharding by hashed _id Created: 20/Dec/23  Updated: 22/Jan/24  Resolved: 22/Jan/24

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Damian Wasilewicz Assignee: Lamont Nelson
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Cluster Scalability
Participants:

 Description   

There is a genny workload CreateBigIndex that, for its third phase which is an InsertData phase, is bulk inserting 10 million documents in batches of 1000. This is being done on one thread. On a sharded environment (specifically the shard-lite build, which has one mongos and two mongod shards), this operation is timing out. We are sharding on _id, hashed. The mongod nodes each have an average latency of ~2 milliseconds per write, and are inserting 500 documents per second - the mongos has an average latency of about 1 second. This indicates that these documents might be currently be processed serially - we should investigate whether this is the case, and whether we can change the workload to run with unordered batches. This was discovered as a result of BF-31192.

The temporary fix for the attached BF was to prevent the workload from running in a sharded enviornment.

It is also worth noting that right now, when the insertData phase is allowed to finish (which can be accomplished by adding a LoggingActor in phase 2 that periodically logs to prevent the timeout, in which case the InsertData phase will finish in about two hours), setting the server parameter maxIndexBuildMemoryUsageMegabytes to 100 on the mongos in sharded builds will cause the workload to fail with an 'unrecognized parameter' error. Any fix for this workload should also take care to address this issue - one solution could be to only run the server command to lower the memory usage threshold when the task is not run in a sharded environment.



 Comments   
Comment by Lamont Nelson [ 22/Jan/24 ]

We determined that the serial behavior is due to the ordered insert flag being included on the command generated by genny.

Generated at Thu Feb 08 06:54:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.