[SERVER-15539] Invariant failure keyOffset >= 0 during 16 thread write command remove operation Created: 06/Oct/14  Updated: 30/Jan/17  Resolved: 07/Jan/15

Status: Closed
Project: Core Server
Component/s: Concurrency, Storage
Affects Version/s: 2.7.5
Fix Version/s: 2.8.0-rc5

Type: Bug Priority: Critical - P2
Reporter: Quentin Conner Assignee: Geert Bosch
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongod-invariant-btree-20141122a.log.gz     File mongod-invariant-btree-20141122b.log.gz     Text File mongoperf.log     File utils.js    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-15784 mmapv1 ASSERT failure advancing btree... Closed
Related
related to SERVER-17280 Geo near must pass invalidation messa... Closed
related to SERVER-27857 Rewrite the remove_during_mr.js test ... Closed
Backwards Compatibility: Fully Compatible
Operating System: Windows
Steps To Reproduce:

Use mongod 78a778 from a green MCI build.

> db.serverBuildInfo()
{
"version" : "2.7.7-pre-",
"gitVersion" : "78a778b23bed9018e56b8573b052ee2117beac46",
"OpenSSLVersion" : "",
"sysInfo" : "Linux build8.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49",
"loaderFlags" : "-fPIC -pthread -Wl,-z,now -rdynamic",
"compilerFlags" : "-Wnon-virtual-dtor -Woverloaded-virtual -std=c++11 -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -pipe -Werror -O3 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-but-set-variable -fno-builtin-memcmp -std=c99",
"allocator" : "tcmalloc",
"versionArray" : [
2,
7,
7,
-100
],
"javascriptEngine" : "V8",
"bits" : 64,
"debug" : false,
"maxBsonObjectSize" : 16777216,
"ok" : 1
}

Start mongod:
./mongod --dbpath /mnt/db2 --smallfiles --fork --logpath /tmp/server.log

Use a mongo shell with version 2.7.7 (hand compiled).

Obtain util/utils.js from the mongo-perf project (mongodb/mongo-perf master branch).

Enter this into mongo a few times until the issue reproduces:

var tests = []
load('/insert-path-to-mongo-perf-here/util/utils.js')
tests.push( { name: "Remove.v2.IntNonIdNoIndex",
              pre: function( collection ) {
                  collection.drop();
                  for ( var i = 0; i < 1000; i++ ) {
                      collection.insert( { x : i } );
                  }
              },
              ops: [
                  { op: "let", target: "x", value: {"#RAND_INT_PLUS_THREAD": [0,1000]}},
                  { op:  "insert",
                    doc: { x : { "#VARIABLE" : "x" } } },
                  { op:  "remove",
                    query: { x : { "#VARIABLE" : "x" } } }
              ] } );
 
runTests([16], 4, 5, 100, 'reproIsIntermittent')

Use the following block to reproduce. The "alternative" is less unwieldy.

ALTERNATIVELY:

from the mongo-perf top level directory:
edit util/benchrun_daemon.sh
set RUNUSER and MPERFBASE if needed
set RHOST and RPORT
uncomment FETCHMCI='TRUE'

The following will run single db first, then multi db second:

$ util/benchrun_daemon.sh 78a778b23bed9018e56b8573b052ee2117beac46

The failure occurred in the second run of the test suite, with multi database (4 databases active).

Participants:

 Description   

Hit an Invariant failure in mmap_v1/btree/btree_logic.cpp at line 2059. The operation that encountered the error was a remove, sent via the write command. 16 threads were executing the mongo-perf Remove.v2.IntNonIdIndex test.

The issue may be intermittent and difficult to reproduce. It may depend on preconditions such as the loading of other data or execution of other tests in the mongo-perf suite.

In other words, the issue may not reproduce unless the mongo-perf suite is run. I was not able to reproduce with a single invocation of benchRun() or runTests in util/utils.js.

mongod is commit 78a778b23bed9018e56b8573b052ee2117beac46
and mongo is r2.7.7

This occurred while running mongo-perf against the Remove.v2.IntNonIdIndex test. The test ran a few trials before the server died:
16 28727.915427512453
16 28967.207063672307
16 28903.757846425404
16 28614.076587610252
16 28621.215665117536

mongod log file shows the following:

2014-10-06T19:26:20.571+0000 I -        [conn25082] Invariant failure keyOffset >= 0 src/mongo/db/storage/mmap_v1/btree/btree_logic.cpp 2059
2014-10-06T19:26:20.597+0000 I -        [conn25082] 
 0xeed369 0xe99031 0xe7faa3 0xcd740e 0xccb6ec 0xa615c3 0x9d953f 0x9da268 0x9d0cf7 0x9cdd84 0xbaf77d 0xad2395 0x98850b 0x98a7b5 0x98adeb 0x98d449 0x9a7d84 0x9a8c11 0x9a9680 0xb9d5ea 0xa7fd03 0x7ed730 0xeac3d1 0x7f16a6625e9a 0x7f16a573473d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"AED369"},{"b":"400000","o":"A99031"},{"b":"400000","o":"A7FAA3"},{"b":"400000","o":"8D740E"},{"b":"400000","o":"8CB6EC"},{"b":"400000","o":"6615C3"},{"b":"400000","o":"5D953F"},{"b":"400000","o":"5DA268"},{"b":"400000","o":"5D0CF7"},{"b":"400000","o":"5CDD84"},{"b":"400000","o":"7AF77D"},{"b":"400000","o":"6D2395"},{"b":"400000","o":"58850B"},{"b":"400000","o":"58A7B5"},{"b":"400000","o":"58ADEB"},{"b":"400000","o":"58D449"},{"b":"400000","o":"5A7D84"},{"b":"400000","o":"5A8C11"},{"b":"400000","o":"5A9680"},{"b":"400000","o":"79D5EA"},{"b":"400000","o":"67FD03"},{"b":"400000","o":"3ED730"},{"b":"400000","o":"AAC3D1"},{"b":"7F16A661E000","o":"7E9A"},{"b":"7F16A5640000","o":"F473D"}],"processInfo":{ "mongodbVersion" : "2.7.7-pre-", "gitVersion" : "78a778b23bed9018e56b8573b052ee2117beac46", "uname" : { "sysname" : "Linux", "release" : "3.2.0-67-virtual", "version" : "#101-Ubuntu SMP Tue Jul 15 17:58:37 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF69D51000", "elfType" : 3 }, { "b" : "7F16A661E000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F16A6416000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F16A6212000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F16A5F12000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F16A5C16000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F16A5A00000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F16A5640000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F16A683B000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xeed369]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xe99031]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xD3) [0xe7faa3]
 mongod(_ZNK5mongo10BtreeLogicINS_13BtreeLayoutV1EE6getKeyEPNS_16OperationContextERKNS_7DiskLocEi+0xBE) [0xcd740e]
 mongod(_ZNK5mongo18BtreeInterfaceImplINS_13BtreeLayoutV1EE6Cursor6getKeyEv+0x1C) [0xccb6ec]
 mongod(_ZNK5mongo16BtreeIndexCursor6getKeyEv+0x13) [0xa615c3]
 mongod(_ZN5mongo9IndexScan8checkEndEv+0x7F) [0x9d953f]
 mongod(_ZN5mongo9IndexScan4workEPm+0xA8) [0x9da268]
 mongod(_ZN5mongo10FetchStage4workEPm+0x57) [0x9d0cf7]
 mongod(_ZN5mongo11DeleteStage4workEPm+0x64) [0x9cdd84]
 mongod(_ZN5mongo12PlanExecutor11executePlanEv+0x2D) [0xbaf77d]
 mongod(_ZN5mongo14DeleteExecutor7executeEPNS_8DatabaseE+0x75) [0xad2395]
 mongod(_ZN5mongo18WriteBatchExecutor10execRemoveERKNS_12BatchItemRefEPPNS_16WriteErrorDetailE+0x38B) [0x98850b]
 mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE+0xF5) [0x98a7b5]
 mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x38B) [0x98adeb]
 mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x169) [0x98d449]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9a7d84]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC51) [0x9a8c11]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x1F0) [0x9a9680]
 mongod(_ZN5mongo11newRunQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERNS_5CurOpES3_b+0x136A) [0xb9d5ea]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xD93) [0xa7fd03]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x7ed730]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x421) [0xeac3d1]
 libpthread.so.0(+0x7E9A) [0x7f16a6625e9a]
 libc.so.6(clone+0x6D) [0x7f16a573473d]
-----  END BACKTRACE  -----
2014-10-06T19:26:20.597+0000 I -        [conn25082] 
 
***aborting after invariant() failure
 
 
2014-10-06T19:26:20.608+0000 F -        [conn25082] Got signal: 6 (Aborted).



 Comments   
Comment by Githook User [ 22/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-15539 Fix a typo and possible concurrency issue
Branch: master
https://github.com/mongodb/mongo/commit/c2756268ceba90afbc05e26875899c6e468e854b

Comment by Githook User [ 07/Jan/15 ]

Author:

{u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}

Message: SERVER-15539: Reimplement saved cursor invalidation for MMAPv1 yielding
Branch: master
https://github.com/mongodb/mongo/commit/de54755e568481d1bdef37339d899403e3b04d86

Comment by Githook User [ 09/Dec/14 ]

Author:

{u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}

Message: SERVER-15539: Properly save/restore state when yielding in M/R

This fixes a regression introduced in commit 29a700ad.
Branch: master
https://github.com/mongodb/mongo/commit/23fb35b7a52d0d50829a6bb9a47954374ab2b61f

Comment by Quentin Conner [ 08/Dec/14 ]

Confirmed with a second test. 230490 causes the BTree invariant.

Comment by Quentin Conner [ 08/Dec/14 ]

The mongo-perf git bisect is (again) complete with 2304904687c5d29e228d86da95244682dc62caa1 identified as reintroducing the BTree invariant. I'm running a second test on 489690cfbccc24cf69b6ae7848581303ec8f4b0e to check for a false negative.

commit 2304904687c5d29e228d86da95244682dc62caa1
Author: Kaloian Manassiev <kaloian.manassiev@mongodb.com>
Date:   Tue Nov 18 11:35:22 2014 -0500
 
    SERVER-16194 Add a lockGlobalBegin capability for the global lock
 
    This allows replication step down to queue itself for acquiring the global
    lock and then go and kill all owners and wait afterwards.

Comment by Quentin Conner [ 03/Dec/14 ]

Sorry, 4c221b5... tested bad after testing good. Bisect is going to carry on a little longer. Currently retesting the Previous GOOD.

Here is my bisect log so far:

commit 609a1492d2895ea55a9202815590636d1934a9ca  BAD
commit 0a02891e3f155e5a187ee14c774d42cbdb290652
commit 8bfae19e291e6ee6e963d2ccc8aa2535ff7c86c4
commit ba6a6919605104afac5153279f7c3a7ef80d66b0
commit 41024fa4a63255a8502732257ba1642b8d445087
commit 8e0e873320a0962460c7a8e821495a325efe176b
commit f56ea73672e1eebde4d2e9cef2f8fca0d84f0956
commit e7f6c56327afa51847a95d9d5bc6399209856c10
commit e5644e2ca12a60df677cb8e8dfc70f19a9423b0a
commit 534263f1d83cdeb142c27f0ea5a1ecffc5b7526a
commit 59487e24591a505acb8eb03d5c5d0652098e0841
commit 5d1f0d27992da53eb1c546f413e5b34e78d8c439
 
commit 5d53a26ce33320b5b91ff5a8fcd2b69be1367367  BAD
commit b2122dd20ae8cd761312301e90e0e7ca429ec8db
commit a932b12ac03366e9336ffc92e7710ddb5c3b3a39
commit 4dda322fdb688a53c243fe7c0df778f5b3f0a0cd  BAD
commit d10ee8adc8265451d9fcbaae9669709618751311  
commit e8c4f9b33b1b1b8b500e94da331a8eb9e0b9ad05  BAD
commit 9ae1dd37b3fa0a0325d65dc1b6c8101902c1c2a3  BAD after GOOD
commit 5c9b4a636753e98de0ba7d9d518bfb6516f843d7  BAD
commit 4c221b5ce50c3eaabc0348432b6df6c41aeabee5  BAD after GOOD
commit 429dc5819eb37e21d9e5c4573aae8421efd50ed7
commit 1b204eba0dc80f4007cb88b995b761a8bc0987fc
 
commit 2ac3b5b8c79726c73e08bdac732f8f09d846d72f  BAD after GOOD
commit 10cf936a3635a72ee61714631050cf54466410eb
commit 13577a48b51202aabd2e55ef95404439aaa4a0c3
commit 42d802af32f99663bc1c66456b2d57749010bed5
commit def8f54bf6162317cc8b345e81c6e698d618ad96
commit acc09c2ec26b27c6d201f5f98a2a9c7b4215b1ae
commit 442d1dc06fd8d04e27a2838995f1eef8bf87d27a
commit e6ff142766cf8cbfa04a9c6acc5cb8ced3ea9fb3
commit 74cd9523a6b9d81dc60bdf6af05f6fa2e8739159
commit eac11632002de066d03936b5246d1d6162e7d240
commit 65ab6bcc319d1d4a1fe2f19615e646cac6dd3f7e
commit 0f8898e95d6096b306fe655ef7141a01f1e71100
 
commit f27fba8fddbb50d0e7ad2522a0a3b766b456b29d  BAD
commit fe6cc50395b76494cae644c98cbc9ec265cf78b1
commit 2304904687c5d29e228d86da95244682dc62caa1  BAD
commit 489690cfbccc24cf69b6ae7848581303ec8f4b0e  GOOD
commit 365cca0c47566d192ca847f0b077cedef4b3430e
commit 217151b66aefddca0a62e92aa095bb4f27dba574
commit 2e2091d502d959f52640de209470ecd3180d7269  GOOD
commit f4c67e1f3e5c0a63567539a5b97c32b22cb693f7
commit c6653bd908a98c3b98ca14a6769cb52da1ff0d19
commit b1be46d80c999a4a583dd55dff5da42b28480f15
commit 42d25fc0fd25d59fb2b2f29f79f09cb4f7f623c6
commit c0a623cf2bba01c4fbb78bda4a7b2e67c1c8d2d8  GOOD

And some notes on conditions to reproduce:

--> mod mongo-perf to turn off FETCHMCI mode, compile from source
--> mod mongo-perf to optionally run single DB tests
--> mod mongo-perf to run only remove tests, would not reproduce
--> tried running only multiDB, would not reproduce
--> needed single DB tests to run as a precondition

Comment by Quentin Conner [ 02/Dec/14 ]

Bisect is complete. Issue reintroduced with 5c9b4a6...

commit 5c9b4a636753e98de0ba7d9d518bfb6516f843d7
Author: Jason Rassi <rassi@10gen.com>
Date: Mon Oct 20 17:09:18 2014 -0400

SERVER-15675 Stages should clear their OperationContext in saveState()

While state is saved, threads pass in their own OperationContext to
invalidate(). When state is restored, the stage will resume with the
OperationContext passed in to restoreState().

I am re-running 4c221b5ce50c3eaabc0348432b6df6c41aeabee5 to double-check and ensure this previous commit is not a false negative.

Comment by Quentin Conner [ 24/Nov/14 ]

On Windows, mongod was started like this (note the generous syncdelay):
mongod.exe --dbpath E:\db --logpath mongoperf.log --storageEngine=mmapv1 --syncdelay 14400

The Invariant occurs during MULTI DB testing with mongo-perf. This is where WriteCommand Remove ops are dispatched to multiple dbs concurrently.

We can reproduce consistently (mongo-perf cannot complete due to this) with 609a149 by running the Remove.v3.IntNonIdIndex test in mongo-perf. In the run I captured, mongod failed at 28 concurrent worker threads.

Remove.v3.IntNonIdIndex^M
1       2808.0912377715867^M
2       5000.489871653627^M
4       9452.950600309854^M
6       12854.041522689231^M
8       15541.64337529379^M
12      17739.166529702445^M
16      18794.576399393347^M
20      18986.472948873343^M
24      19454.18839195589^M

Lastly, the mongo shell invocation by the mongo-perf test harness looks like this:

load('util/utils.js')^M
^M
load('testcases/complex_insert.js')^M
^M
load('testcases/complex_update.js')^M
^M
load('testcases/long_multi_update.js')^M
^M
load('testcases/mixed_small.js')^M
^M
load('testcases/simple_commands.js')^M
^M
load('testcases/simple_geo_2dsphere_index.js')^M
^M
load('testcases/simple_geo_2d_index.js')^M
^M
load('testcases/simple_insert.js')^M
^M
load('testcases/simple_multi_update.js')^M
^M
load('testcases/simple_query.js')^M
^M
load('testcases/simple_remove.js')^M
^M
load('testcases/simple_update.js')^M
^M
load('testcases/simple_update_mms.js')^M
^M
mongoPerfRunTests([1, 2, 4, 6, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48], 8, 1, 5, 1, 'Windows-cap-win-1-2K12R2-c8171e7f969519af8b87a43425ae291ee69a0191-mmapv1-multidb8', 'sani
ty', 'mongo-perf/mongo-perf-db-1.vpc3.10gen.cc,mongo-perf-db-2.vpc3.10gen.cc', '27017', '2014-11-24 03:30:43.621000', 0, {"writeCmdMode": "true", "writeConcernW": 0, "safeGLE": 
"false", "writeConcernJ": "false"}, {"server_git_hash": "c8171e7f969519af8b87a43425ae291ee69a0191", "server_storage_engine": "mmapv1", "harness": {"git_hash": "unknown", "client
": {"git_hash": "0e9cb3b20498b9f167afaff7a5c4a4d1da7e06a2", "version": "2.7.8", "name": "mongo shell"}, "name": "mongo-perf", "version": "unknown"}, "server_git_commit_date": "2
014-11-23 01:24:53", "server_version": "2.8.0-rc1"});^M
^M

Comment by Quentin Conner [ 23/Nov/14 ]

2.8.0-rc2 MMAPv1 storage engine invariant failure during mongo-perf. Two examples.

Comment by Quentin Conner [ 23/Nov/14 ]

This invariant has shown up again as of 2.8.0-rc1 (609a149).

I have a couple of failed mongo-perf runs from 22 November 2014, with log files attached. This happens on our 64-core Windows 2012 R2 test slave.

Comment by Quentin Conner [ 20/Nov/14 ]

still good at c0a623c

Comment by Quentin Conner [ 10/Nov/14 ]

I noticed at git hash f91d3efd69a154d46a73caf2e6b5a5f632b56061 (in master) that we aren't seeing the assert any longer.

I have not bisected to see which checkin fixed it, but have several complete (good) mongo-perf runs with MMAPv1 (same two-socket hardware).

Comment by Quentin Conner [ 20/Oct/14 ]

This keyOffset >=0 assert has now been observed on an EC2 box running Windows 2008. The box is believed to be a single socket, unlike the other environment (Linux).

Comment by Quentin Conner [ 10/Oct/14 ]

Found in Version should be 2.7.5.
$ git describe
r2.7.4-18-g9b5f950

Comment by Quentin Conner [ 10/Oct/14 ]

git bisect is complete. It appears the "Use correct algorithm in BtreeLogic::Builder" commit is to blame. Here is the git bisect log:

# bad: [2966c35b20416d55c3d481e5f254a9befb1b2338] BUMP 2.7.5
# good: [4357cef5c06a42ec17e0cf8b5ada46a46f05866c] BUMP 2.7.0
git bisect start '2966c35b20416d55c3d481e5f254a9befb1b2338' '4357cef5c06a42ec17e0cf8b5ada46a46f05866c'
# good: [d9c3bebb6b5b48dea69d5548b2d75db009812ee5] SERVER-14277 Clean up some cases of throwing strings when parsing replset configs
git bisect good d9c3bebb6b5b48dea69d5548b2d75db009812ee5
# good: [308af33f76d1e15c105f04ea762f8951b47c5168] SERVER-11118 add dateToString aggregation operator
git bisect good 308af33f76d1e15c105f04ea762f8951b47c5168
# bad: [2b17cf09a524f3f99ce811019543b9bb89b16c78] SERVER-13635: make profile tests work for generic storage engines
git bisect bad 2b17cf09a524f3f99ce811019543b9bb89b16c78
# bad: [f58fbec3e4418adc6a266f41ad2872451fab760a] SERVER-6086 fixed shell to handle zero-width and double-width characters
git bisect bad f58fbec3e4418adc6a266f41ad2872451fab760a
# good: [cdcce7b8a9599d59c76c4255bd1aa62821b9b17d] SERVER-14378: clean up cloner changes for no system namespaces
git bisect good cdcce7b8a9599d59c76c4255bd1aa62821b9b17d
# bad: [78d2f38aa445ef1658300e66e1db14b9f1eceba8] SERVER-13961 Pass through OperationContext in the JS framework
git bisect bad 78d2f38aa445ef1658300e66e1db14b9f1eceba8
# good: [ece23f5a892d74360038f4f11f54c11e43f432cd] SERVER-13635 Use StringData and StringMap in DBHolder
git bisect good ece23f5a892d74360038f4f11f54c11e43f432cd
# bad: [172d4f6f8bd09d0b2b77d9fb3bf280d01cfb72a1] SERVER-14604: Add replSetGetConfig command
git bisect bad 172d4f6f8bd09d0b2b77d9fb3bf280d01cfb72a1
# bad: [94880490ed1d138be2a6547726ef598511f8311a] SERVER-14584 Clean up BtreeLogic::pushBack/_pushBack naming
git bisect bad 94880490ed1d138be2a6547726ef598511f8311a
# bad: [9b5f950603376b36eb2659ae5b8fdae2ee8714cf] SERVER-14584 Use correct algorithm in BtreeLogic::Builder
git bisect bad 9b5f950603376b36eb2659ae5b8fdae2ee8714cf
# first bad commit: [9b5f950603376b36eb2659ae5b8fdae2ee8714cf] SERVER-14584 Use correct algorithm in BtreeLogic::Builder

And the commit comment:

commit 9b5f950603376b36eb2659ae5b8fdae2ee8714cf
Author: Mathias Stearn <mathias@10gen.com>
Date: Wed Jul 16 11:17:34 2014 -0400

SERVER-14584 Use correct algorithm in BtreeLogic::Builder

Both algorithms cover phases 2 and 3 of forground index building. Importantly
they both take as input pre-sorted data that has already been extracted from
the raw documents (phase 1).

OLD ALGORITHM:
Phase 2:
Add all keys to buckets. When a bucket gets full, create a new bucket and
link it as the original bucket's parent pointer. At the end of the stage
we have a linked-list of leaf nodes starting as the left-most corner.

Phase 3:
Walk the linked produced by phase 2, and build the layer above the leaves.
With each bucket, except for the last (right-most), pop* its highest key
and add it to the bucket that will be it's parent. If the current parent
bucket is full a new one is added to higher-layer linked list as in
phase 1. Phase 3 is repeated on the new higher-layer linked list until you
get a layer with a single bucket which is marked as the root of the tree.

  • Importantly, popping a key is what sets the nextChild pointer on a
    bucket. It is set to what used to be the prevChildNode of the key that is
    popped. A major flaw of this algorithm is that because no key is ever
    popped from the root, its nextChild pointer is never filled in.

NEW ALGORITHM:
Phase 2:
Add all keys to buckets. When a bucket gets full, pop the highest key
(setting the nextChild pointer of the bucket to the prevChild of the
popped key), add the popped key to a parent bucket, and create a new right
sibling bucket to add the new key to. If the parent bucket is full, this
same operation is performed on the parent and all full ancestors. If we
get to the root and it is full, a new root is created above the current
root. When creating a new right sibling, it is set as its parent's
nextChild as all keys in the right sibling will be higher than all keys
currently in the parent.

Phase 3:
Does nothing. This may go away soon.

Non-exhaustive list of benefits to the new algorthm:

  • Generates a valid btree.
  • Doesn't require a second pass over all of the buckets in the index. This can
    be significantly faster for indexes that don't fit in ram.

:040000 040000 819555cfa90cd1a28c8281b4eaa102212a262eaa 4a316f83e664c80c8c33693ae76d15ec21bf7edb M src

Comment by Geert Bosch [ 08/Oct/14 ]

Hi Quentin,

Let's look at this tomorrow. It would be really helpful if we can narrow down the commit around which the error was introduced. Is the r2.7.6 server known-good?

-Geert

Comment by Quentin Conner [ 06/Oct/14 ]

benchRun invocation looks like this:

benchRun( {
        "ops" : [
                {
                        "op" : "let",
                        "target" : "x",
                        "value" : {
                                "#RAND_INT_PLUS_THREAD" : [
                                        0,
                                        1000
                                ]
                        },
                        "ns" : "test0.Remove_v2.IntNonIdNoIndex",
                        "safe" : false,
                        "w" : 0,
                        "j" : false,
                        "writeCmd" : true
                },
                {
                        "op" : "insert",
                        "doc" : {
                                "x" : {
                                        "#VARIABLE" : "x"
                                }
                        },
                        "ns" : "test0.Remove_v2.IntNonIdNoIndex",
                        "safe" : false,
                        "w" : 0,
                        "j" : false,
                        "writeCmd" : true
                },
                {
                        "op" : "remove",
                        "query" : {
                                "x" : {
                                        "#VARIABLE" : "x"
                                }
                        },
                        "ns" : "test0.Remove_v2.IntNonIdNoIndex",
                        "safe" : false,
                        "w" : 0,
                        "j" : false,
                        "writeCmd" : true
                }
        ],
        "seconds" : 5,
        "host" : "127.0.0.1",
        "parallel" : 16
} )

Generated at Thu Feb 08 03:38:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.