[SERVER-16495] Map Reduce Output to Collection Occasionally Kills the Server Created: 10/Dec/14  Updated: 25/Apr/16  Resolved: 17/Dec/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.8, 2.6.4, 2.7.8, 2.8.0-rc2
Fix Version/s: 2.8.0-rc3

Type: Bug Priority: Major - P3
Reporter: Craig Wilson Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File 24-db27020.log     Text File 27-db27020.log    
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

We are running on Windows 2012 R2.
2 mongos, 1 config server, and 1 mongod (shard).

Participants:

 Description   

In the .NET Driver's build matrix, we get sporadic failures when running a test related to map/reduce outputting to a collection. Sometimes our test runs go completely fine, and other times we'll get 2 or 3 failures of this across 2.4, 2.6, and 2.7 nightlies. This also only happens on sharded systems. It seems completely random. I have managed to grab log files from the mongo from 2.4 runs and 2.7 nightly runs, but haven't been able to reproduce with 2.6 yet (although the failure has occured).

The following test is what is being run:

        [Test]
        [RequiresServer]
        public async Task ExecuteAsync_should_return_expected_result()
        {
            await DropCollectionAsync();
            await InsertAsync(
                new BsonDocument { { "_id", 1 }, { "x", 1 }, { "v", 1 } },
                new BsonDocument { { "_id", 2 }, { "x", 1 }, { "v", 2 } },
                new BsonDocument { { "_id", 3 }, { "x", 2 }, { "v", 4 } });
 
            var query = new BsonDocument();
            var mapFunction = "function() { emit(this.x, this.v); }";
            var reduceFunction = "function(key, values) { var sum = 0; for (var i = 0; i < values.length; i++) { sum += values[i]; }; return sum; }";
            var subject = new MapReduceOutputToCollectionOperation(_collectionNamespace, _outputCollectionNamespace, mapFunction, reduceFunction, query, _messageEncoderSettings);
            var expectedDocuments = new BsonDocument[]
            {
                new BsonDocument { {"_id", 1 }, { "value", 3 } },
                new BsonDocument { {"_id", 2 }, { "value", 4 } },
            };
 
            var response = await ExecuteOperationAsync(subject);
 
            response["ok"].ToBoolean().Should().BeTrue();
 
            var documents = await ReadAllFromCollectionAsync(_outputCollectionNamespace);
            documents.Should().BeEquivalentTo(expectedDocuments);
        }



 Comments   
Comment by Githook User [ 17/Dec/14 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-16495 Map Reduce Output to Collection Occasionally Kills the Server
Branch: master
https://github.com/mongodb/mongo/commit/d9c075230dd40567c2bf66180e200c47f15a35e7

Comment by Eric Milkie [ 10/Dec/14 ]

Locations that need fix:
mr.cpp:1584
commands_public.cpp:179
commands_public.cpp:635
commands_public.cpp:1837
commands_public.cpp:2215
commands_public.cpp:2644
commands_public.cpp:2667

Comment by Eric Milkie [ 10/Dec/14 ]

It looks like grid.getDBConfig(db, false) can return a null boost shared_ptr if 'db' doesn't exist (the 'false' means don't try to create the db, just return null).
There are several places in the code that do not check for null before attempting to call a function on the pointer. MapReduceFinishCommand::run() contains one such instance.
We should go through and correct all call sites to check for null if appropriate.

Comment by Eric Milkie [ 10/Dec/14 ]

Filed SERVER-16496 to help diagnose these types of problems in the future.
I believe the issue is that we are calling a virtual function on a null boost shared_ptr. I'll see if I can track down why this is happening.

Generated at Thu Feb 08 03:41:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.