[CSHARP-1626] Reading and writing Streams with more than 2GB of BSON data Created: 15/Apr/16 Updated: 02/Jan/20 Resolved: 19/Apr/16 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | BSON, Serialization |
| Affects Version/s: | 2.2.3 |
| Fix Version/s: | 2.2.4 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Marc Simkin | Assignee: | Robert Stam |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
C#, .NET 4.6.1 |
||
| Attachments: |
|
| Description |
|
I reading a BSON dump file in order to convert the object from one format to another. When attempting to write the converted object out, I'm getting the an error message from MongoDB.Bson.IO.BSonBinaryWriter.BackpatchSize. The message is "Size 4294967329 is larger than MaxDocumentSize 2147483647.". Below is the code in question. EanSet and ActiveRecommendations are the two objects involved. EanSet is the object that is in Bson dump file. ActiveRecommendations is the new object. ActiveRecommendations should be a smaller object than EanSet. I know that my source data should be greater than 2MB. My goal is to take the new BSON file that is generated by the below code, and do a MongoRestore to a new collection on our Mongo servers. Please advice how I can get this working. Thanks marc
|
| Comments |
| Comment by Githook User [ 19/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}Message: | ||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 19/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}Message: | ||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 19/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}Message: | ||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 19/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}Message: | ||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 19/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}Message: | ||||||||||||||||||||||||||||||||||
| Comment by Robert Stam [ 18/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
After reviewing all the IO classes it looks like to fully support binary streams bigger than 2GB the following two classes need changes:
For the first class, strict backward compatibility would require introducing the new property instead of changing the type of the existing property. The second class is an internal class, so we can change the type of the StartPosition without worrying about backward compatibility. | ||||||||||||||||||||||||||||||||||
| Comment by Robert Stam [ 18/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
We expect to release 2.2.4 in the next week or two. | ||||||||||||||||||||||||||||||||||
| Comment by Marc Simkin [ 18/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Hi Robert: Thanks for the update. What is the rough date for the next release? -marc | ||||||||||||||||||||||||||||||||||
| Comment by Robert Stam [ 18/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Hi Marc. Thanks for the the analysis. Since this doesn't look like it involves changing any public classes we can probably get this fixed in the next release. I'll start work on this right away, including looking for any other similar or related issues. | ||||||||||||||||||||||||||||||||||
| Comment by Marc Simkin [ 18/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Craig, that is the issue. I created a private build and changed the following:
The calls to BsonBinaryWriterContext in the following methods:
When I started to write my document # 2,693,377 the file position was already at 2,147,482,853. All I needed to write was 794 bytes to cause the overflow. The document that was being written was bigger than that. Please change this from a question to a bug. Please let me know when this has been resolved, so that I can stop using my private build. Thank you for your help. -marc | ||||||||||||||||||||||||||||||||||
| Comment by Craig Wilson [ 18/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Very likely... I haven't had a chance to look yet at your uploaded code. I'll do that shortly. We had | ||||||||||||||||||||||||||||||||||
| Comment by Marc Simkin [ 18/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Hi Criag: I looked at the BsonBinaryWriter code, on line 607 inside the WriteStartDocument, you are casting the _bsonStream.Position from a long to an int in order to pass it to the BsonBinaryWriterContext class. _context = new BsonBinaryWriterContext(_context, contextType, (int)_bsonStream.Position); Could that be the issue? thanks marc | ||||||||||||||||||||||||||||||||||
| Comment by Marc Simkin [ 18/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Hi Craig, if possible can I get some idea what happened and how to resolve this Monday by midday? I have a delivery date to meet, and I will need to decide if this approach will work, or I need to chose a different approach. Thanks marc | ||||||||||||||||||||||||||||||||||
| Comment by Marc Simkin [ 15/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
I forgot three files. Also, I didn't give you all the classes that might be references in EanSet, as they are not being serialized back out. | ||||||||||||||||||||||||||||||||||
| Comment by Marc Simkin [ 15/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Hi Craig: Attached please find the source files. I didn't give you a runnable solution, just the CS files. If you need the dump file, please let me know. The dump file is a 0.75 GB compressed. Thank you for all the help. -marc | ||||||||||||||||||||||||||||||||||
| Comment by Craig Wilson [ 15/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Ok. So my apologies. Our bson reader and write can certainly read and write the mongodump format. I was confused. So, I wrote a little test program that basically does exactly what you are doing and didn't have any issues. Could you put up the whole program? If not, can I at least see your class definitions? Craig | ||||||||||||||||||||||||||||||||||
| Comment by Craig Wilson [ 15/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Well, if it's reading it fine, then I might simply be wrong about the bson dump file format. I'll run some tests and get back to you and let you know. | ||||||||||||||||||||||||||||||||||
| Comment by Marc Simkin [ 15/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Hi Craig, Yes, I used mongodump do generate the file. I had no issues reading the file and all the data seems to be correct. I used the dump file as the input to a SQL Bulk Loader application. I used the BsonBinaryReader to read that file. So, if the BsonBinaryReader class in C# driver library is not officially supported for reading dump files, I guess I need to find a different approach to (1) extracting the data, (2) transforming the data, and (3) loading the transformed data back into Mongo and also into SQL Server. Either way, I would like to understand the error so I can learn from what I did wrong with the driver. Is there a documentation some where on what the MongoImport/MongoExport file formats are? Thanks marc | ||||||||||||||||||||||||||||||||||
| Comment by Craig Wilson [ 15/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Hi Mark, How are you getting the bson dump file? If you are using one from mongodump, then this is not the same file format as our bson binary reader understands. Our BsonBinaryReader/Writer classes are built for the wire protocol documents. We do not have a reader/writer for the bson dump file format. If you'd like to request support for those, we can turn this into a feature ticket. Craig | ||||||||||||||||||||||||||||||||||
| Comment by Marc Simkin [ 15/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
FYI, the GetActiveRecommendations method just converts the EanSet object to an ActiveRecommendations object and applies some business rules. Thanks marc | ||||||||||||||||||||||||||||||||||
| Comment by Marc Simkin [ 15/Apr/16 ] | ||||||||||||||||||||||||||||||||||
|
Here is the full stack dump from the error:
|