[CSHARP-144] MongoCollection Can't distinct a large size of data Created: 28/Dec/10  Updated: 02/Apr/15  Resolved: 28/Dec/10

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 0.9
Fix Version/s: 1.0

Type: Bug Priority: Major - P3
Reporter: xuqing Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS:WIn7 x64 Enterprise
VS:VS2010



 Description   

I have a collection contains about 821463 documents in my local mongodb.I try to distinct TrackID in C# Program with the method Distinct(string key) of MongoCollection.I got a Exception,Message is:
Invalid BSONObj spec size: 25354392 (98E08201)
The stack trace is below:
At MongoDB.Driver.Internal.MongoReplyMessage`1.ReadFrom(BsonBuffer buffer)
At MongoDB.Driver.Internal.MongoConnection.ReceiveMessage[TDocument]()
At MongoDB.Driver.MongoCursor`2.MongoCursorEnumerator.GetReply(MongoRequestMessage message)
At MongoDB.Driver.MongoCursor`2.MongoCursorEnumerator.GetFirst()
At MongoDB.Driver.MongoCursor`2.MongoCursorEnumerator.MoveNext()
At System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
At MongoDB.Driver.MongoCollection.FindOneAs[TQuery,TDocument](TQuery query)
At MongoDB.Driver.MongoCollection`1.FindOne[TQuery](TQuery query)
At MongoDB.Driver.MongoDatabase.RunCommand[TCommand](TCommand command)
At MongoDB.Driver.MongoCollection.Distinct[TQuery](String key, TQuery query)
At MongoDB.Driver.MongoCollection.Distinct(String key)
At MongoWriteSpeed.Program.Main(String[] args) Location E:\Program\WindowsProject\MongoWriteSpeed\MongoWriteSpeed\Program.cs:row 27

My Program is Below:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using MongoDB.Driver;
using MongoDB.Bson;

namespace MongoWriteSpeed
{
class Program
{
static void Main(string[] args)
{
MongoServer _dbServer = MongoServer.Create();

try
{
if ((_dbServer.State == MongoServerState.Disconnected) || (_dbServer.State == MongoServerState.None))

{ _dbServer.Connect(); MongoDatabase _db = _dbServer.GetDatabase("music"); MongoCollection<MongoDB.Bson.BsonDocument> _document = _db.GetCollection("tTrack"); IEnumerable<BsonValue> _trackIdList = _document.Distinct("TrackID"); }

}
catch (Exception ex)
{
System.Console.WriteLine(string.Format("Faile?Exception?

{0}

", ex.Message));
}
finally

{ _dbServer.Disconnect(); }

}
}
}



 Comments   
Comment by Robert Stam [ 28/Dec/10 ]

Works as designed.

Comment by Robert Stam [ 28/Dec/10 ]

It's not a problem with the C# driver. It's a limitation on the amount of data that can be returned from the server in a single result document. You probably need to investigate other ways of doing this (map/reduce, client side, etc...).

Comment by xuqing [ 28/Dec/10 ]

In my document,Some Key don't only have english vales,Some values are saved as Chinese or Japanese.So maybe it's too large.But 821463 is a small part of the whole data.Now we hand this problem with linq.But maybe it's a issue of C# driver.Can this be fixed in the next version?

Comment by Robert Stam [ 28/Dec/10 ]

I haven't tried to reproduce this yet, but I think you are just running into the limits of a BSON document. The result of a distinct command has to fit in a single BSON document. The error message makes it look like the result document is 25MB long, which is longer than a BSON document can be. If you divide 25354392 by 821463 you get 30.86, so given some overhead of how information is encoded it looks like you just can't return 821463 TrackIDs in a single result document.

Comment by xuqing [ 28/Dec/10 ]

the data in document like follow:

{
"_id": "4d16d843475bdea6f9fe0e12",
"TrackID": "jwadypbo8rhuqcboge",
"TrackName": "NiJigen Gangsta",
"Genre": "Other",
"Duration": "0:02:36",
"ReleaseDate": "2004-12-31T16:00:00.0000000Z",
"LyricUrl": "\n ",
"Language": "",
"AlbumID": "goad1qjz",
"AlbumName": "Nijigen Gangsta E.P",
"ArtistIDList": "ge3ucpj3",
"ArtistNameList": "Amnjk",
"TopicID": ""
}

Comment by Robert Stam [ 28/Dec/10 ]

What element names do the documents in the tTrack collection have? I'll need to create some fake data to try to reproduce this with, and I'd like my fake data to be similar to yours.

Generated at Wed Feb 07 21:35:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.