[SERVER-17119] Verify fails during AND_HASH query stage's readFirstChild Created: 29/Jan/15  Updated: 18/Sep/15  Resolved: 30/Mar/15

Status: Closed
Project: Core Server
Component/s: Concurrency, Storage, WiredTiger
Affects Version/s: 3.0.0-rc7
Fix Version/s: 3.0.2, 3.1.1

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File patch.patch     Text File yield_and_hashed.patch    
Issue Links:
Depends
depends on SERVER-17062 Make query execution robust to WriteC... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

Happens every time you run fsm_all with the workloads in the patch on MCI. If you don't want to run all of them, it should reproduce with just the yield_and_hash.js workload.

Sprint: Quint Iteration 3.1.1
Participants:

 Description   

Error occurs while running an FSM test which intersperses yielding an AND_HASH queries with updates/removes. Logs linked, important part of logs here:

2015-01-29T20:14:32.898+0000 I -        [conn836] Assertion failure _dataMap.end() == _dataMap.find(member->loc) src/mongo/db/exec/and_hash.cpp 291
2015-01-29T20:14:32.905+0000 I CONTROL  [conn836] 
 0xf3a819 0xee4961 0xec973e 0x9d46fe 0x9d540b 0x9e1d3d 0xbb9cd3 0xa0e779 0x9b71a4 0x9b80e3 0x9b8cdb 0xb87f95 0xa99f88 0x7e6730 0xef88eb 0x2ae5723dc83d 0x2ae573266fcd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B3A819"},{"b":"400000","o":"AE4961"},{"b":"400000","o":"AC973E"},{"b":"400000","o":"5D46FE"},{"b":"400000","o":"5D540B"},{"b":"400000","o":"5E1D3D"},{"b":"400000","o":"7B9CD3"},{"b":"400000","o":"60E779"},{"b":"400000","o":"5B71A4"},{"b":"400000","o":"5B80E3"},{"b":"400000","o":"5B8CDB"},{"b":"400000","o":"787F95"},{"b":"400000","o":"699F88"},{"b":"400000","o":"3E6730"},{"b":"400000","o":"AF88EB"},{"b":"2AE5723D6000","o":"683D"},{"b":"2AE573192000","o":"D4FCD"}],"processInfo":{ "mongodbVersion" : "3.1.0-pre-", "gitVersion" : "746d9b99a8dc0301c1593bd4dc62fc8433435c7e", "uname" : { "sysname" : "Linux", "release" : "2.6.18-194.el5xen", "version" : "#1 SMP Tue Mar 16 22:01:26 EDT 2010", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "2AE5723D6000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "2AE5725F2000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "2AE5727FB000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "2AE572A00000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "2AE572D00000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "2AE572F83000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "2AE573192000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "2AE5721B8000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf3a819]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xee4961]
 mongod(_ZN5mongo12verifyFailedEPKcS1_j+0xCE) [0xec973e]
 mongod(_ZN5mongo12AndHashStage14readFirstChildEPm+0x2EE) [0x9d46fe]
 mongod(_ZN5mongo12AndHashStage4workEPm+0x11B) [0x9d540b]
 mongod(_ZN5mongo10FetchStage4workEPm+0xCD) [0x9e1d3d]
 mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0xA3) [0xbb9cd3]
 mongod(_ZN5mongo13StageDebugCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x559) [0xa0e779]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9b71a4]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC13) [0x9b80e3]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9b8cdb]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_b+0x755) [0xb87f95]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xAC8) [0xa99f88]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x7e6730]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x32B) [0xef88eb]
 libpthread.so.0(+0x683D) [0x2ae5723dc83d]
 libc.so.6(clone+0x6D) [0x2ae573266fcd]
-----  END BACKTRACE  -----
2015-01-29T20:14:32.940+0000 I NETWORK  [conn836] end connection 127.0.0.1:55969 (5 connections now open)
2015-01-29T20:14:33.150+0000 I NETWORK  [conn833] end connection 127.0.0.1:55966 (4 connections now open)
2015-01-29T20:14:33.201+0000 I NETWORK  [conn835] end connection 127.0.0.1:55968 (3 connections now open)
2015-01-29T20:14:33.251+0000 I NETWORK  [conn832] end connection 127.0.0.1:55965 (2 connections now open)
----
jstests/concurrency/fsm_workloads/yield_and_hashed.js: Workload completed in 550 ms
----
2015-01-29T20:14:33.254+0000 E QUERY    Error: 1 thread threw
        Error: command failed: {
        	"results" : [ ],
        	"errmsg" : "exception: assertion src/mongo/db/exec/and_hash.cpp:291",
        	"code" : 0,
        	"ok" : 0
        } : undefined
            at quietlyDoAssert (jstests/concurrency/fsm_libs/assert.js:53:15)
            at Function.assert.commandWorked (src/mongo/shell/assert.js:254:5)
            at wrapAssertFn (jstests/concurrency/fsm_libs/assert.js:60:16)
            at Function.assertWithLevel.(anonymous function) [as commandWorked] (jstests/concurrency/fsm_libs/assert.js:99:13)
2015-01-29T20:14:33.252+0000 I NETWORK  [conn834] end connection 127.0.0.1:55967 (1 connection now open)
            at Object.andHash (jstests/concurrency/fsm_workloads/yield_and_hashed.js:34:22)
            at Object.runFSM [as run] (jstests/concurrency/fsm_libs/fsm.js:19:16)
            at <anonymous>:8:13
            at Object.main (jstests/concurrency/fsm_libs/worker_thread.js:68:17)
            at ____MongoToV8_newFunction_temp (<anonymous>:5:25)
            at ____MongoToV8_newFunction_temp (<anonymous>:3:24)
    at throwError (jstests/concurrency/fsm_libs/runner.js:261:23)
    at jstests/concurrency/fsm_libs/runner.js:390:17
    at Array.forEach (native)
    at runWorkloads (jstests/concurrency/fsm_libs/runner.js:345:22)
    at serial (jstests/concurrency/fsm_libs/runner.js:406:13)
    at /data/mci/shell/src/jstests/concurrency/fsm_all.js:29:1 at jstests/concurrency/fsm_libs/runner.js:273
failed to load: /data/mci/shell/src/jstests/concurrency/fsm_all.js



 Comments   
Comment by Githook User [ 31/Mar/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-17119 fix yield_and_hashed.js to test a correct plan for WT

(cherry picked from commit 20cc174108c3ecf5040ebd64655b0cf39804e886)
Branch: v3.0
https://github.com/mongodb/mongo/commit/52aeb98fefd624a2494cd909c785aa3b912f71c2

Comment by Githook User [ 31/Mar/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-17119 fix yield_and_hashed.js to test a correct plan for WT
Branch: master
https://github.com/mongodb/mongo/commit/20cc174108c3ecf5040ebd64655b0cf39804e886

Comment by Githook User [ 30/Mar/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-17119 relax no-dups invariant in AND_HASH stage to accommodate WT

Since WT does not issue invalidations, docs will not be removed from the hash table when they
are updated.

(cherry picked from commit 36240ff99233a3f454bbd914fc691f5598283f10)
Branch: v3.0
https://github.com/mongodb/mongo/commit/b78fcf61ee88dd4536ac040fccea503e0aef639f

Comment by Githook User [ 30/Mar/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-17119 relax no-dups invariant in AND_HASH stage to accommodate WT

Since WT does not issue invalidations, docs will not be removed from the hash table when they
are updated.
Branch: master
https://github.com/mongodb/mongo/commit/36240ff99233a3f454bbd914fc691f5598283f10

Comment by David Storch [ 29/Jan/15 ]

The AND_HASH stage is reading results from its first child index scan and hashing these results based on their RecordId. The particular invariant that trips is ensuring that the RecordId from the child has not been seen yet, i.e. that the child index scan did not give us back a duplicate RecordId.

However, it is possible to get a duplicate RecordId back when running a server configured the WiredTiger storage engine. Consider this scenario:

  • Child 0 of the AND_HASH stage is scanning an index on a from 5 to 10 inclusive.
  • It gets the RecordIds for 5 and 6 and puts them in the hash table.
  • The plan yields.
  • During the yield, another thread updates the document with key 6 and gives is a value of 9.
  • After waking up again, the AND_HASH stage encounters key 9, and notices that it has already seen this particular RecordId.

This is impossible in MMAP v1 due to the invalidation system, but invalidation notifications for updated RecordIds are not used with WiredTiger.

Generated at Thu Feb 08 03:43:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.