[CSHARP-586] MongoCollection.InsertBatch with empty IEnumerable Created: 02/Oct/12  Updated: 11/Mar/19  Resolved: 06/Apr/15

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.6
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Manuel Warum Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

MongoCollection.InsertBatch inserts any number of documents into a collection; however, if the provided IEnumerable is empty, this causes an Assertion to be thrown on the server side, "Message contains no documents". While this has no side-effects that I know of, I suggest a minor improvement: the driver should not dispatch a message to the server at all if the provided IEnumerable is empty.

While it may be a rare and/or obscure scenario, it helps to remove probably unnecessary IO between driver and server and keeps the logs clean. I notice this "issue" happens occassionally in one of my ETL-based applications. While I can address this in my application's code, I think it would also be a nice little improvement on the driver's part to perform a check on its own.



 Comments   
Comment by Manuel Warum [ 15/Oct/12 ]

I think putting the responsability to keep batches below a threshold into the developer's hands can be problematic in some use cases. Consider ETL jobs, or any kind of InsertBatch scenario using yield-return constructs. This would mean that the developer in question either ignores the new limitation (or does not know about it because it never came up during development but may very well come up in production), or needs to re-implement what MongoCollection.InsertBatch already does: splitting it up and dispatching it one by one.

Moreover, no longer creating sub-batches as necessary by the driver breaks behavioral compatibility with existing applications. Calls to InsertBatch would then fail.

I would suggest to keep InsertBatch as-is with my suggestion concerning empty sets not causing empty messages/roundtrips.

Comment by Robert Stam [ 12/Oct/12 ]

There are a couple of other ideas regarding InsertBatch that might affect the return value.

One idea is that it should only return a single SafeModeResult: the final one. All the intermediate results will have implicitly succeeded since no exception was thrown.

Another idea is that InsertBatch should NOT automatically split a batch into smaller sub-batches. While this behavior is possibly convenient, it can be considered surprising. The method name InsertBatch implies there is a single batch. If this were adopted it would be up to the caller not to submit a batch that was too large to fit into a single message (an exception would be thrown if they did). This might actually be a good thing since batches 16MB or larger are unnecessary. You only need 10-100 documents per batch to get almost all of the performance gains possible.

Comments?

Comment by Craig Wilson [ 12/Oct/12 ]

Hmm... Interesting option. I wasn't thinking about it like that, just returning an empty enumerable. I think that would work very well.

Comment by Manuel Warum [ 12/Oct/12 ]

What about a fifth option? Would it be a breaking change to not include a SafeModeResult? Technically, no change in the state space should taken place (as far as this operation is concerned).

Personally, I wouldn't even go so far as to connect to the server. Between preconditions and the actual operation, I would attempt to test if the provided enumerable is empty (which can be non-trivial for IEnumerables, but if the provided argument is an ICollection<>, there should be no trouble). If the provided documents argument is empty, nothing needs to be done, new SafeModeResults[0] would be returned.

Comment by Craig Wilson [ 11/Oct/12 ]

Manuel,

Since we return a SafeModeResult object from InsertBatch, what should the SafeModeResult be in this particular case? Let me clarify the issue a little: When SafeMode=false, the result is null. However, when SafeMode = true, or W = 3, or J = true, etc... we return a SafeModeResult indicating the success of the operation. mongod and mongos return differently structured responses for different types of SafeModes. Hence, if you InsertBatch with an empty enumerable with W = majority, the result will look slightly different from if you used J = true. In addition, different versions of the server could alter these responses as required.

So with that being said, we can do 1 of 4 things.

  • return null. This seems unexpected because if you have specified a SafeMode, then you might be expecting a SafeModeResult, which always exists unless you happen to pass in an empty enumerable.
  • fake a bare-bones response. Simply set "ok" to true and move on. This also seems unexpected because if J = true, you might be looking for a specific field which always exists unless you happen to pass in an empty enumerable.
  • fake a real-response. This would mean we emulate exactly what the server does. We'd have to know the server version you are talking to, whether it's mongos or mongod, and then we'd have to keep up with it every server release to make sure we are faking the response accurately. This seems like overkill and lots of maintainability issues solely for this feature. In addition, knowing some of these items (like the server version) is impossible in some situations, e.g. if auth is enabled.
  • send the empty batch to the server. This also doesn't feel right for obvious reasons.

We don't really have any great solutions that are going to appease every use case, so we'd like to know your thoughts.

Generated at Wed Feb 07 21:37:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.