[SERVER-1555] Add UTF-8 Validation Option on mongodump and mongorestore Created: 03/Aug/10  Updated: 29/May/12  Resolved: 17/Nov/10

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.4.4
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: J. Gray Assignee: Kristina Chodorow (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu @ EC2


Participants:

 Description   

Recently made database dump that I was apparently able to reimport but which gave me a "Invalid UTF-8 Data" error when I tried to query it upon restoration. The dump or the restoration failed, but I don't know which because I can't tell if the db dump resulted in a file filled with valid data. Similarly, I couldn't tell if the restoration had succeeded whenever I went to restore the data.



 Comments   
Comment by Kristina Chodorow (Inactive) [ 17/Nov/10 ]

Please comment if this is still a problem for you.

Comment by Eliot Horowitz (Inactive) [ 29/Sep/10 ]

any more info on this?

Comment by Eliot Horowitz (Inactive) [ 03/Aug/10 ]

there is a bsondump utility if you compile from master.
you can use that to view raw bson files

it will be included in 1.6.0

Comment by J. Gray [ 03/Aug/10 ]

The shell is where I'm seeing this error; I've been writing data via php and then reading it at the shell so I can have an idea of what the query should look like before putting it on to the web with php. My understanding is that the php driver that I'm using doesn't allow the insertion of bad data.

What do you think would cause this error if not dump/restore? I asked at 10gen office hours last week if there was any way to view/edit a dump file and got the impression there's not (actually, they did say something about a ruby-based viewer that doesn't allow editing), so I guess I'm looking for suggestions on how to verify that my dump file is good before deleting my db and/or how to verify that a dump file will restore properly. Please let me know if anything comes to mind.

Comment by Eliot Horowitz (Inactive) [ 03/Aug/10 ]

The odds of it being dump/restore are incredibly small.
If you're getting a utf-8 error, and not a bson error - there is almost 0 chance it has do with dump/restore.
I know the php driver allowed you to insert bad utf-8 data before.
Have you used the shell to view data before? or just php?

Comment by J. Gray [ 03/Aug/10 ]

I don't have the original database anymore so cannot confirm it didn't have the same issue, but I was only loading data in with the php driver and never experienced the error message, leading me to speculate that I didn't have this issue prior to the dump. Given that I had the dbpath and dump on the same 10GB EBS volume, it seems most likely that I wrote a dump file that ended unexpectedly or that the dump file restoration ended unexpectedly, and I'm hoping to have a means by which to determine whether it's the dump file or the restoration that leads to the "Invalid UTF-8" error.

Comment by Eliot Horowitz (Inactive) [ 03/Aug/10 ]

Are you sure the original database doesn't have the same issue?
Do you still have it?

We don't modify bson in/out - so i don't think this is a dump/restore issue.

what driver are you using?

Generated at Thu Feb 08 02:57:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.