[SERVER-23526] Replication relies on storage engines reporting a non-zero size for correctness Created: 05/Apr/16  Updated: 22/Nov/16  Resolved: 15/Apr/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: 3.2.6, 3.3.5

Type: Bug Priority: Major - P3
Reporter: Alexander Gorrod Assignee: Alexander Gorrod
Resolution: Done Votes: 0
Labels: code-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-23795 master/slave looks at on-disk size on... Closed
is related to WT-2533 Ensure that in-memory tables don't re... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   

The new inMemory storage engine does not necessarily return a non-zero storage size for collections, even when they contain data. The replication sub-system assumes that the size will be non-zero when there is content.

Further analysis from sue.loverso:
I think it is the primary looking at the on-disk size to determine what to send to a secondary on a resync.
I believe the code in mongo/db/repl/master_slave.cpp:423:forceResync() is the code in question. The listDatabases() command in mongo/db/commands/list_databases.cpp adds an empty boolean of true if the size on disk returned is 0. Then the forceResync code calls resyncDrop if it is empty. There are no comments nearby for that decision.



 Comments   
Comment by Michael Cahill (Inactive) [ 19/Apr/16 ]

keith.bostic, we'll need a new ticket, this one is fixed by the workaround of never having WiredTiger return a storage size of zero.

Comment by Keith Bostic (Inactive) [ 15/Apr/16 ]

michael.cahill, I thought we were going to use this ticket to consider changing MongoDB to not check for a on-disk size of zero and removing the WiredTiger workaround; is there a purpose to that check?

Comment by Githook User [ 14/Apr/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.8.0-201-g7ea2631.tar.gz from wiredtiger branch mongodb-3.2

ref: 43e885a..7ea2631

SERVER-23504 Coverity analysis defect 98177: Resource leak
SERVER-23526 Replication relies on storage engines reporting a non-zero size for correctness
SERVER-23588 mongod with WiredTiger won't start on Windows when built with --dbg=on --opt=off
SERVER-23682 WiredTiger changes for MongoDB 3.2.6
WT-2330 in-memory configurations should not create on-disk collection files
WT-2507 Add upgrading documentation in preparation for 2.8 release.
WT-2512 wtperf: MSVC complains about float conversion in throttle code
WT-2513 conversion from 'int64_t' to 'uint32_t'
WT-2517 wtperf uses setvbuf in a way that isn't supported on Windows
WT-2522 Incorrect format code in message
WT-2525 in-memory configurations: miscellaneous cleanups
WT-2527 OS X compile error, missing POSIX_FADV_WILLNEED #define
WT-2528 style error in WiredTiger build
WT-2529 The readonly test case is crashing with a stack overflow
WT-2531 in-memory tables are allocating unnecessary memory
WT-2532 WT_STREAM_APPEND and WT_STREAM_LINE_BUFFER flag overlap
WT-2533 Ensure that in-memory tables don't report a zero size
WT-2534 Invalid transaction snapshots on PowerPC
Branch: v3.2
https://github.com/mongodb/mongo/commit/7ee4e4e493c3785fea489ee3508ca18526709c16

Comment by Githook User [ 07/Apr/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: WT-2533 Don't let in-memory tables return a zero size.

Returning a zero size breaks MongoDB replication. Return a non-zero
size for now until SERVER-23526 is resolved.
Branch: mongodb-3.2
https://github.com/wiredtiger/wiredtiger/commit/95ffda62606c136fb571d9e15460a12b9c2d1074

Comment by Keith Bostic (Inactive) [ 05/Apr/16 ]

One possible confusion here is an on-disk WiredTiger database will never have a zero-sized object, even an empty collection will take up some room on disk.

So, any optimization based on zero-sized objects may not be useful, it won't ever fire.

Comment by Githook User [ 05/Apr/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: WT-2533 Don't let in-memory tables return a zero size.

Returning a zero size breaks MongoDB replication. Return a non-zero
size for now until SERVER-23526 is resolved.
Branch: mongodb-3.4
https://github.com/wiredtiger/wiredtiger/commit/95ffda62606c136fb571d9e15460a12b9c2d1074

Comment by Githook User [ 05/Apr/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: WT-2533 Don't let in-memory tables return a zero size.

Returning a zero size breaks MongoDB replication. Return a non-zero
size for now until SERVER-23526 is resolved.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/95ffda62606c136fb571d9e15460a12b9c2d1074

Comment by Alexander Gorrod [ 05/Apr/16 ]

To reproduce the failure, back out the workaround committed in WT-2533, and run the replication jstest suite against the inMemory storage engine. The command I use is:

python buildscripts/resmoke.py --storageEngine=inMemory --excludeWithAnyTags=requires_persistence --suite=replication

The repl8.js test fails because it never finishes an initial sync, error output:

[js_test:repl8] 2016-04-05T10:57:17.579+1000 d20010| 2016-04-05T10:57:17.579+1000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:38184 #2 (2 connections now open)
[js_test:repl8] 2016-04-05T10:57:18.580+1000 d20011| 2016-04-05T10:57:18.580+1000 I REPL     [replslave] sleep 2 sec before next pass
[js_test:repl8] 2016-04-05T10:57:20.581+1000 d20011| 2016-04-05T10:57:20.581+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:21.581+1000 d20011| 2016-04-05T10:57:21.581+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:22.581+1000 d20011| 2016-04-05T10:57:22.581+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:23.582+1000 d20011| 2016-04-05T10:57:23.582+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:24.582+1000 d20011| 2016-04-05T10:57:24.582+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:25.583+1000 d20011| 2016-04-05T10:57:25.583+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:26.583+1000 d20011| 2016-04-05T10:57:26.583+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:27.584+1000 d20011| 2016-04-05T10:57:27.584+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:28.584+1000 d20011| 2016-04-05T10:57:28.584+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:29.585+1000 d20011| 2016-04-05T10:57:29.585+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:31.352+1000 d20011| 2016-04-05T10:57:31.352+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:32.353+1000 d20011| 2016-04-05T10:57:32.352+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:33.353+1000 d20011| 2016-04-05T10:57:33.353+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:34.353+1000 d20011| 2016-04-05T10:57:34.353+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:35.354+1000 d20011| 2016-04-05T10:57:35.354+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:36.354+1000 d20011| 2016-04-05T10:57:36.354+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:37.355+1000 d20011| 2016-04-05T10:57:37.355+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:38.355+1000 d20011| 2016-04-05T10:57:38.355+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:39.356+1000 d20011| 2016-04-05T10:57:39.355+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:41.352+1000 d20011| 2016-04-05T10:57:41.352+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:42.353+1000 d20011| 2016-04-05T10:57:42.353+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:43.353+1000 d20011| 2016-04-05T10:57:43.353+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:44.353+1000 d20011| 2016-04-05T10:57:44.353+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:45.354+1000 d20011| 2016-04-05T10:57:45.354+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:46.354+1000 d20011| 2016-04-05T10:57:46.354+1000 I REPL     [replslave] syncing from host:127.0.0.1:20010
[js_test:repl8] 2016-04-05T10:57:46.731+1000 assert.soon failed: function () {
[js_test:repl8] 2016-04-05T10:57:46.731+1000     return s.getDB(baseName).getCollection("first").isCapped();
[js_test:repl8] 2016-04-05T10:57:46.731+1000 }
[js_test:repl8] 2016-04-05T10:57:46.731+1000 doassert@src/mongo/shell/assert.js:15:14
[js_test:repl8] 2016-04-05T10:57:46.731+1000 assert.soon@src/mongo/shell/assert.js:176:13
[js_test:repl8] 2016-04-05T10:57:46.731+1000 @jstests/repl/repl8.js:14:1
[js_test:repl8] 2016-04-05T10:57:46.731+1000
[js_test:repl8] 2016-04-05T10:57:46.732+1000 2016-04-05T10:57:46.731+1000 E QUERY    [thread1] Error: assert.soon failed: function () {
[js_test:repl8] 2016-04-05T10:57:46.732+1000     return s.getDB(baseName).getCollection("first").isCapped();
[js_test:repl8] 2016-04-05T10:57:46.732+1000 } :
[js_test:repl8] 2016-04-05T10:57:46.732+1000 doassert@src/mongo/shell/assert.js:15:14
[js_test:repl8] 2016-04-05T10:57:46.732+1000 assert.soon@src/mongo/shell/assert.js:176:13
[js_test:repl8] 2016-04-05T10:57:46.732+1000 @jstests/repl/repl8.js:14:1
[js_test:repl8] 2016-04-05T10:57:46.732+1000
[js_test:repl8] 2016-04-05T10:57:46.732+1000 failed to load: jstests/repl/repl8.js

Generated at Thu Feb 08 04:03:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.