[CSHARP-1626] Reading and writing Streams with more than 2GB of BSON data Created: 15/Apr/16  Updated: 02/Jan/20  Resolved: 19/Apr/16

Status: Closed
Project: C# Driver
Component/s: BSON, Serialization
Affects Version/s: 2.2.3
Fix Version/s: 2.2.4

Type: Bug Priority: Minor - P4
Reporter: Marc Simkin Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

C#, .NET 4.6.1


Attachments: Zip Archive CSharp-1626.zip     Text File Override.cs     Text File OverrideRec.cs     Text File Rec.cs    

 Description   

I reading a BSON dump file in order to convert the object from one format to another. When attempting to write the converted object out, I'm getting the an error message from MongoDB.Bson.IO.BSonBinaryWriter.BackpatchSize. The message is "Size 4294967329 is larger than MaxDocumentSize 2147483647.".

Below is the code in question. EanSet and ActiveRecommendations are the two objects involved. EanSet is the object that is in Bson dump file. ActiveRecommendations is the new object. ActiveRecommendations should be a smaller object than EanSet.

I know that my source data should be greater than 2MB.

My goal is to take the new BSON file that is generated by the below code, and do a MongoRestore to a new collection on our Mongo servers.

Please advice how I can get this working.

Thanks

marc

private static bool ConvertObjects(FileStream sourceFile, FileStream destFileStream)
        {
            ActiveRecommendations activeRec = null;
            long totalRecords = 0;
            var loadStart = DateTime.Now;
 
            try
            {
 
                using (var writer = new BsonBinaryWriter(destFileStream))
                {
                    using (var reader = new BsonBinaryReader(sourceFile))
                    {
                        while (!reader.IsAtEndOfFile())
                        {
                            activeRec = GetActiveRecommendation(BsonSerializer.Deserialize<EanSet>(reader));
                            BsonSerializer.Serialize<ActiveRecommendations>(writer, activeRec);
 
                            totalRecords++;
 
                            if ((totalRecords % _reportInterval) == 0)
                            {
                                RenScribe.LogInfo($"Total records converted so far is {totalRecords:N0}.");
                            }
                        }
                    }
                }
            }
            catch (Exception ex)
            {
 
                throw;
            }
 
            RenScribe.LogInfo($"Total records converted {totalRecords:N0} in {DateTime.Now - loadStart}.");
 
            return true;
 
        }



 Comments   
Comment by Githook User [ 19/Apr/16 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1626: Mark test that should only run in 64-bit process.
Branch: v2.2.x
https://github.com/mongodb/mongo-csharp-driver/commit/cc934b8c98b7d14a702ed19fa267865eec5e196c

Comment by Githook User [ 19/Apr/16 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1626: BsonBinaryReader should support reading more than 2GB.
Branch: v2.2.x
https://github.com/mongodb/mongo-csharp-driver/commit/a70241b81b686a218244808ec45dd7f85e281a18

Comment by Githook User [ 19/Apr/16 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1626: BsonBinaryWriter should support writing more than 2GB.
Branch: v2.2.x
https://github.com/mongodb/mongo-csharp-driver/commit/e0ce68c90f67cef13d8a17111dd0bdc40489045d

Comment by Githook User [ 19/Apr/16 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1626: BsonBinaryReader should support reading more than 2GB.
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/8f9f28966b95e647dcf01aeaac6ca3a219bb909d

Comment by Githook User [ 19/Apr/16 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1626: BsonBinaryWriter should support writing more than 2GB.
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/556db91b9ada00881b4154329eb4a7ecd217d336

Comment by Robert Stam [ 18/Apr/16 ]

After reviewing all the IO classes it looks like to fully support binary streams bigger than 2GB the following two classes need changes:

BsonBinaryReaderBookmark (either change type of Position property to long or add new LongPosition property).
 
BsonBinaryWriterContext (change type of StartPosition property to long)

For the first class, strict backward compatibility would require introducing the new property instead of changing the type of the existing property.

The second class is an internal class, so we can change the type of the StartPosition without worrying about backward compatibility.

Comment by Robert Stam [ 18/Apr/16 ]

We expect to release 2.2.4 in the next week or two.

Comment by Marc Simkin [ 18/Apr/16 ]

Hi Robert:

Thanks for the update.

What is the rough date for the next release?

-marc

Comment by Robert Stam [ 18/Apr/16 ]

Hi Marc. Thanks for the the analysis. Since this doesn't look like it involves changing any public classes we can probably get this fixed in the next release.

I'll start work on this right away, including looking for any other similar or related issues.

Comment by Marc Simkin [ 18/Apr/16 ]

Craig, that is the issue. I created a private build and changed the following:

  • BsonBinaryWriterContext:
    _startPosition from int to long
    constructor parameter.

The calls to BsonBinaryWriterContext in the following methods:

  • WriteStartDocument
  • WriteStartArray
  • WriteJavaScriptWithScope

When I started to write my document # 2,693,377 the file position was already at 2,147,482,853. All I needed to write was 794 bytes to cause the overflow. The document that was being written was bigger than that.

Please change this from a question to a bug.

Please let me know when this has been resolved, so that I can stop using my private build.

Thank you for your help.

-marc

Comment by Craig Wilson [ 18/Apr/16 ]

Very likely... I haven't had a chance to look yet at your uploaded code. I'll do that shortly. We had CSHARP-1254 a little while ago, which was fixed in 2.1. This might be something similar.

Comment by Marc Simkin [ 18/Apr/16 ]

Hi Criag:

I looked at the BsonBinaryWriter code, on line 607 inside the WriteStartDocument, you are casting the _bsonStream.Position from a long to an int in order to pass it to the BsonBinaryWriterContext class.

_context = new BsonBinaryWriterContext(_context, contextType, (int)_bsonStream.Position);

Could that be the issue?

thanks

marc

Comment by Marc Simkin [ 18/Apr/16 ]

Hi Craig, if possible can I get some idea what happened and how to resolve this Monday by midday? I have a delivery date to meet, and I will need to decide if this approach will work, or I need to chose a different approach.

Thanks

marc

Comment by Marc Simkin [ 15/Apr/16 ]

I forgot three files. Also, I didn't give you all the classes that might be references in EanSet, as they are not being serialized back out.

Comment by Marc Simkin [ 15/Apr/16 ]

Hi Craig:

Attached please find the source files. I didn't give you a runnable solution, just the CS files.

If you need the dump file, please let me know. The dump file is a 0.75 GB compressed.

Thank you for all the help.

-marc

Comment by Craig Wilson [ 15/Apr/16 ]

Ok. So my apologies. Our bson reader and write can certainly read and write the mongodump format. I was confused.

So, I wrote a little test program that basically does exactly what you are doing and didn't have any issues. Could you put up the whole program? If not, can I at least see your class definitions?

Craig

Comment by Craig Wilson [ 15/Apr/16 ]

Well, if it's reading it fine, then I might simply be wrong about the bson dump file format. I'll run some tests and get back to you and let you know.

Comment by Marc Simkin [ 15/Apr/16 ]

Hi Craig,

Yes, I used mongodump do generate the file. I had no issues reading the file and all the data seems to be correct. I used the dump file as the input to a SQL Bulk Loader application.

I used the BsonBinaryReader to read that file.

So, if the BsonBinaryReader class in C# driver library is not officially supported for reading dump files, I guess I need to find a different approach to (1) extracting the data, (2) transforming the data, and (3) loading the transformed data back into Mongo and also into SQL Server.

Either way, I would like to understand the error so I can learn from what I did wrong with the driver.

Is there a documentation some where on what the MongoImport/MongoExport file formats are?

Thanks

marc

Comment by Craig Wilson [ 15/Apr/16 ]

Hi Mark,

How are you getting the bson dump file? If you are using one from mongodump, then this is not the same file format as our bson binary reader understands. Our BsonBinaryReader/Writer classes are built for the wire protocol documents. We do not have a reader/writer for the bson dump file format. If you'd like to request support for those, we can turn this into a feature ticket.

Craig

Comment by Marc Simkin [ 15/Apr/16 ]

FYI, the GetActiveRecommendations method just converts the EanSet object to an ActiveRecommendations object and applies some business rules.

Thanks

marc

Comment by Marc Simkin [ 15/Apr/16 ]

Here is the full stack dump from the error:

? ex
{"Size 4294967329 is larger than MaxDocumentSize 2147483647."}
    Data: {System.Collections.ListDictionaryInternal}
    HResult: -2146233033
    HelpLink: null
    IPForWatsonBuckets: {90778969}
    InnerException: null
    IsTransient: false
    Message: "Size 4294967329 is larger than MaxDocumentSize 2147483647."
    RemoteStackTrace: null
    Source: "MongoDB.Bson"
    StackTrace: "   at MongoDB.Bson.IO.BsonBinaryWriter.BackpatchSize()\r\n   at MongoDB.Bson.IO.BsonBinaryWriter.WriteEndDocument()\r\n   at MongoDB.Bson.Serialization.BsonClassMapSerializer`1.SerializeClass(BsonSerializationContext context, BsonSerializationArgs args, TClass document)\r\n   at MongoDB.Bson.Serialization.BsonClassMapSerializer`1.Serialize(BsonSerializationContext context, BsonSerializationArgs args, TClass value)\r\n   at MongoDB.Bson.Serialization.IBsonSerializerExtensions.Serialize[TValue](IBsonSerializer`1 serializer, BsonSerializationContext context, TValue value)\r\n   at MongoDB.Bson.Serialization.Serializers.EnumerableSerializerBase`2.Serialize(BsonSerializationContext context, BsonSerializationArgs args, TValue value)\r\n   at MongoDB.Bson.Serialization.Serializers.SerializerBase`1.MongoDB.Bson.Serialization.IBsonSerializer.Serialize(BsonSerializationContext context, BsonSerializationArgs args, Object value)\r\n   at MongoDB.Bson.Serialization.IBsonSerializerExtensions.Serialize(IBson
Serializer serializer, BsonSerializationContext context, Object value)\r\n   at MongoDB.Bson.Serialization.BsonClassMapSerializer`1.SerializeNormalMember(BsonSerializationContext context, Object obj, BsonMemberMap memberMap)\r\n   at MongoDB.Bson.Serialization.BsonClassMapSerializer`1.SerializeClass(BsonSerializationContext context, BsonSerializationArgs args, TClass document)\r\n   at MongoDB.Bson.Serialization.BsonClassMapSerializer`1.Serialize(BsonSerializationContext context, BsonSerializationArgs args, TClass value)\r\n   at MongoDB.Bson.Serialization.BsonSerializer.Serialize[TNominalType](IBsonWriter bsonWriter, TNominalType value, Action`1 configurator, BsonSerializationArgs args)\r\n   at EanSetsToActiveRecsConvert.Program.ConvertObjects(FileStream sourceFile, FileStream destFileStream) in D:\\Work\\BN\\ProductDataSystemsGitRepos\\Tools\\RenRecommendationApps\\EanSetsToActiveRecsConvert\\Program.cs:line 122"
    TargetSite: {Void BackpatchSize()}
    WatsonBuckets: null
    _HResult: -2146233033
    _className: null
    _data: {System.Collections.ListDictionaryInternal}
    _dynamicMethods: null
    _exceptionMethod: {Void BackpatchSize()}
    _exceptionMethodString: null
    _helpURL: null
    _innerException: null
    _ipForWatsonBuckets: {90778969}
    _message: "Size 4294967329 is larger than MaxDocumentSize 2147483647."
    _remoteStackIndex: 0
    _remoteStackTraceString: null
    _safeSerializationManager: {System.Runtime.Serialization.SafeSerializationManager}
    _source: "MongoDB.Bson"
    _stackTrace: {sbyte[384]}
    _stackTraceString: null
    _watsonBuckets: null
    _xcode: -532462766
    _xptrs: {0}

Generated at Wed Feb 07 21:40:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.