[SERVER-29517] Data race with ViewGraph::_idCounter can corrupt the in-memory ViewGraph Created: 08/Jun/17  Updated: 30/Oct/23  Resolved: 08/Jun/17

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.4.4
Fix Version/s: 3.4.5, 3.5.9

Type: Bug Priority: Critical - P2
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: read-only-views
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Sprint: Query 2017-06-19
Participants:
Linked BF Score: 0

 Description   

The ViewGraph is an in-memory directed acyclic graph data structure in which nodes represent view definitions and edges represent "view-on" relationships. This structure assigns unique unsigned 64 bit numbers to each node in the graph, using ViewGraph::_idCounter:

https://github.com/mongodb/mongo/blob/6f7fd7318d61bd145bb75a9a0a5d35387d2a6b9f/src/mongo/db/views/view_graph.h#L182

The intention is that concurrent access to this counter is prevented by the ViewCatalog's mutex. However, the _idCounter is a static data member. There is a ViewCatalog per database, each owning and synchronizing access to a separate ViewGraph instance. Since the _idCounter is static, all ViewGraph instances share the same counter! This means that the various ViewGraphs can all access the counter simultaneously. This leads to the assignment of invalid node ids, which in turn corrupts the in-memory graph. We have seen this manifest as a process-fatal invariant failure, or as an unexpected failed view catalog operation (e.g. a view drop, modify, or create).



 Comments   
Comment by Githook User [ 08/Jun/17 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-29517 Fix data race by making ViewGraph::_idCounter non-static.

(cherry picked from commit e2376ccbb43d3fb2579995a55ebf82f7c16fcb4f)
Branch: v3.4
https://github.com/mongodb/mongo/commit/520b8f3092c48d934f0cd78ab5f40fe594f96863

Comment by David Storch [ 08/Jun/17 ]

The invariant failure associated with this problem that we've observed in testing looks like this:

[MongoDFixture:job0] 2017-06-07T21:20:45.163+0000 I -        [conn1516] Invariant failure node->children.empty() src/mongo/db/views/view_graph.cpp 130
...
[MongoDFixture:job0] 
[MongoDFixture:job0] ***aborting after invariant() failure
[MongoDFixture:job0] 
[MongoDFixture:job0] 
[MongoDFixture:job0] 2017-06-07T21:20:45.166+0000 I COMMAND  [conn1580] CMD: drop db167.view_catalog_70
[MongoDFixture:job0] 2017-06-07T21:20:45.168+0000 I COMMAND  [conn1572] command db163.coll163 appName: "MongoDB Shell" command: find { find: "coll163", filter: { x: 259.0, tid: 25.0 } } planSummary: IXSCAN { tid: 1 } keysExamined:1200 docsExamined:1200 cursorExhausted:1 numYields:0 nreturned:1 reslen:135 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 267304 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 272ms
[MongoDFixture:job0] 2017-06-07T21:20:45.168+0000 I COMMAND  [conn1568] command db163.coll163 appName: "MongoDB Shell" command: find { find: "coll163", filter: { x: 80.0, tid: 27.0 } } planSummary: IXSCAN { tid: 1 } keysExamined:1200 docsExamined:1200 cursorExhausted:1 numYields:0 nreturned:1 reslen:135 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 267300 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 272ms
[MongoDFixture:job0] 2017-06-07T21:20:45.176+0000 F -        [conn1516] Got signal: 6 (Abort trap: 6).
[MongoDFixture:job0] 
[MongoDFixture:job0]  0x10e2cb27a 0x10e2caaf0 0x7fff91dbef1a 0x110eac000 0x7fff8c4b0b73 0x10e25d348 0x10e0960c9 0x10e08fea8 0x10e08f580 0x10e08f08c 0x10e0913f9 0x10d94fa85 0x10d951453 0x10d945a39 0x10d9a8496 0x10d9a523d 0x10d9a4365 0x10def2e69 0x10db54096 0x10d80f2aa 0x10d80fb78 0x10e2574de 0x10e257af1 0x7fff914d72fc 0x7fff914d7279 0x7fff914d54b1
[MongoDFixture:job0] ----- BEGIN BACKTRACE -----
[MongoDFixture:job0] {"backtrace":[{"b":"10D801000","o":"ACA27A","s":"_ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE"},{"b":"10D801000","o":"AC9AF0","s":"_ZN5mongo12_GLOBAL__N_110abruptQuitEi"},{"b":"7FFF91DBA000","o":"4F1A","s":"_sigtramp"},{"b":"0","o":"110EAC000"},{"b":"7FFF8C453000","o":"5DB73","s":"abort"},{"b":"10D801000","o":"A5C348","s":"_ZN5mongo15invariantFailedEPKcS1_j"},{"b":"10D801000","o":"8950C9","s":"_ZN5mongo9ViewGraph23insertWithoutValidatingERKNS_14ViewDefinitionERKNSt3__16vectorINS_15NamespaceStringENS4_9allocatorIS6_EEEEi"},{"b":"10D801000","o":"88EEA8","s":"_ZZN5mongo11ViewCatalog16_upsertIntoGraphEPNS_16OperationContextERKNS_14ViewDefinitionEENK3$_3clES5_b"},{"b":"10D801000","o":"88E580","s":"_ZN5mongo11ViewCatalog16_upsertIntoGraphEPNS_16OperationContextERKNS_14ViewDefinitionE"},{"b":"10D801000","o":"88E08C","s":"_ZN5mongo11ViewCatalog26_createOrUpdateView_inlockEPNS_16OperationContextERKNS_15NamespaceStringES5_RKNS_9BSONArrayENSt3__110unique_ptrINS_17CollatorInterfaceENS9_14default_deleteISB_EEEE"},{"b":"10D801000","o":"8903F9","s":"_ZN5mongo11ViewCatalog10createViewEPNS_16OperationContextERKNS_15NamespaceStringES5_RKNS_9BSONArrayERKNS_7BSONObjE"},{"b":"10D801000","o":"14EA85","s":"_ZN5mongo8Database10createViewEPNS_16OperationContextENS_10StringDataERKNS_17CollectionOptionsE"},{"b":"10D801000","o":"150453","s":"_ZN5mongo12userCreateNSEPNS_16OperationContextEPNS_8DatabaseENS_10StringDataENS_7BSONObjEbRKS5_"},{"b":"10D801000","o":"144A39","s":"_ZN5mongo16createCollectionEPNS_16OperationContextERKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEERKNS_7BSONObjESD_"},{"b":"10D801000","o":"1A7496","s":"_ZN5mongo9CmdCreate3runEPNS_16OperationContextERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEERNS_7BSONObjEiRS9_RNS_14BSONObjBuilderE"},{"b":"10D801000","o":"1A423D","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"10D801000","o":"1A3365","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"10D801000","o":"6F1E69","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"10D801000","o":"353096","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"10D801000","o":"E2AA","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKNSt3__110shared_ptrINS_9transport7SessionEEE"},{"b":"10D801000","o":"EB78","s":"_ZNSt3__110__function6__funcIZN5mongo23ServiceEntryPointMongod12startSessionENS_10shared_ptrINS2_9transport7SessionEEEE3$_0NS_9allocatorIS8_EEFvRKS7_EEclESC_"},

Comment by Githook User [ 08/Jun/17 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-29517 Fix data race by making ViewGraph::_idCounter non-static.
Branch: master
https://github.com/mongodb/mongo/commit/e2376ccbb43d3fb2579995a55ebf82f7c16fcb4f

Generated at Thu Feb 08 04:21:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.