[SERVER-21956] applyOps does not correctly propagate operation cancellation exceptions Created: 18/Dec/15  Updated: 21/Nov/16  Resolved: 23/Dec/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.2.3, 3.3.0

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: code-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-21050 Add a failover workload to cause CSRS... Closed
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Completed:
Participants:

 Description   

This was discovered while running the sharding suite with continuous primary stepdown thread enabled. The applyOps command uses DBDirectClient and for this reason if stepdown happens just at the time the operation is about to start and the threads is interrupted, DBDirectClient will end up returning error 13106 instead of interruption.

Here are some excerpts from the verbose logs:

[js_test:balance_repl] 2015-12-18T18:02:13.249+0000 c20514| 2015-12-18T18:02:12.936+0000 D -        [conn32] User Assertion: 11601:operation was interrupted
[js_test:balance_repl] 2015-12-18T18:02:13.250+0000 c20514| 2015-12-18T18:02:12.936+0000 I QUERY    [conn32] assertion 11601 operation was interrupted ns:config.chunks query:{ query: { ns: "test.foo" }, orderby: { lastmod: -1 } }
[js_test:balance_repl] 2015-12-18T18:02:13.250+0000 c20514| 2015-12-18T18:02:12.936+0000 I QUERY    [conn32]  ntoskip:0 ntoreturn:1
[js_test:balance_repl] 2015-12-18T18:02:13.250+0000 c20514| 2015-12-18T18:02:12.936+0000 I QUERY    [conn32] query config.chunks query: { query: { ns: "test.foo" }, orderby: { lastmod: -1 } } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 exception: operation was interrupted code:11601 numYields:0 reslen:71 locks:{ Global: { acquireCount: { r: 3, W: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } 0ms
[js_test:balance_repl] 2015-12-18T18:02:13.251+0000 c20514| 2015-12-18T18:02:12.936+0000 D -        [conn32] User Assertion: 13106:nextSafe(): { $err: "operation was interrupted", code: 11601 }
[js_test:balance_repl] 2015-12-18T18:02:13.252+0000 c20514| 2015-12-18T18:02:12.936+0000 D COMMAND  [conn32] assertion while executing command 'applyOps' on database 'config' with arguments '{ applyOps: [ { op: "u", b: true, ns: "config.chunks", o: { _id: "test.foo-_id_600.0", lastmod: Timestamp 1000|15, lastmodEpoch: ObjectId('56744a23fc2e02a76c6d8248'), ns: "test.foo", min: { _id: 600.0 }, max: { _id: 700.0 }, shard: "test-rs0" }, o2: { _id: "test.foo-_id_600.0" } }, { op: "u", b: true, ns: "config.chunks", o: { _id: "test.foo-_id_700.0", lastmod: Timestamp 1000|16, lastmodEpoch: ObjectId('56744a23fc2e02a76c6d8248'), ns: "test.foo", min: { _id: 700.0 }, max: { _id: MaxKey }, shard: "test-rs0" }, o2: { _id: "test.foo-_id_700.0" } } ], preCondition: [ { ns: "config.chunks", q: { query: { ns: "test.foo" }, orderby: { lastmod: -1 } }, res: { lastmod: Timestamp 1000|14 } } ], maxTimeMS: 30000 }' and metadata '{ $replData: 1 }': 13106 nextSafe(): { $err: "operation was interrupted", code: 11601 }

Putting this ticket in the sharding bucket, because sharding is the main consumer of applyOps.



 Comments   
Comment by Githook User [ 13/Jan/16 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-21956 DBClientCursor should propagate the correct exception code
Branch: v3.2
https://github.com/mongodb/mongo/commit/b2820de69b15c120c04406aa1448cbc0aa3fde66

Comment by Githook User [ 23/Dec/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-21956 DBClientCursor should propagate the correct exception code
Branch: master
https://github.com/mongodb/mongo/commit/551e33cd86e8fcb6c87050d0249bac6fc8342534

Comment by Githook User [ 22/Dec/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: Revert "SERVER-21956 DBClientCursor should propagate the correct exception code"

This reverts commit a4c73dacca8a88a1cbf7b729b65ebb8e47f3f3c0.
Branch: master
https://github.com/mongodb/mongo/commit/ebfbeb9a05f13d72bd4e535b4388ed8b3a7a39b5

Comment by Kaloian Manassiev [ 22/Dec/15 ]

Setting the backwards compatibility field to "minor change" because with this fix applyOps will start reporting the actual error code which caused the failure during a query instead of code 13106.

Comment by Githook User [ 22/Dec/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-21956 DBClientCursor should propagate the correct exception code
Branch: master
https://github.com/mongodb/mongo/commit/a4c73dacca8a88a1cbf7b729b65ebb8e47f3f3c0

Generated at Thu Feb 08 03:58:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.