[SERVER-82105] Make it easier to read and edit BSON data Created: 12/Oct/23  Updated: 30/Oct/23  Resolved: 15/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Sulabh Mahajan Assignee: Yury Ershov
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to WT-11811 Enhance wt utility to operate on tabl... Open
Backwards Compatibility: Fully Compatible
Sprint: Joker - StorEng - 2023-10-17
Participants:
Story Points: 3

 Description   

WiredTiger recently extracted data from a MongoDB internal collection and edited it manually to restore the database to a functional state. Since for MongoDB, the values stored in WiredTiger files are BSON data - we needed to get the binary bytes out from the file - interpret it as BSON, reflect it in a human-readable form, then edit as we needed, and get the binary bytes out to be restored into the WiredTiger file.

I am creating this ticket to document how we interpreted the BSON from the binary data, and modified it. Also, we can write some tooling to make it easier for more teams/support to do.

Eventually, together with the wt utility, the goal is to make it easier to manipulate MongoDB database for support and investigation.



 Comments   
Comment by Yury Ershov [ 30/Oct/23 ]

Converting the text output from the wt utility to real binary format:

$ printf "$(pbpaste | perl -npE 's{\\}{\\x}g')"

where pbpaste prints the clipboard content on MacOS.

Comment by Yury Ershov [ 13/Oct/23 ]

Here's a screencast.

Comment by Yury Ershov [ 13/Oct/23 ]

I used a hex editor to edit the BSON file. It's a risky way of doing it.

Now I used an open library in Python for manipulating BSON and made a tool that can be used by tech people to view and edit BSON filers.

Here's its README.

A short summary how to use it:

# Download
$ git clone https://github.com/wiredtiger/mongo-tests.git
 
# Convert BSON into a readable and editable form:
$ mongo-tests/bson-tool/bson-dump-py bson-file > bson-gen.py
 
# edit bson-gen.py (e.g. remove entries for missing files)
 
# Generate modified file
$ python3 bson-gen.py
 
# Check the result
$ mongo-tests/bson-tool/bson-dump bson-file > bson-orig
$ mongo-tests/bson-tool/bson-dump bson-file-out > bson-new
$ diff bson-orig bson-new

Generated at Thu Feb 08 06:48:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.