[SERVER-22885] Unable to find some of filed in document which were imported by mondoimport. Created: 28/Feb/16  Updated: 11/Mar/16  Resolved: 11/Mar/16

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.2.1
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Chaedoo Jun [X] Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File 20140104000021-2.json     JPEG File Before apply.JPG     JPEG File Result.JPG     JPEG File Returend1.JPG     JPEG File Select media serial number.JPG     JPEG File after apply.JPG    
Issue Links:
Duplicate
duplicates SERVER-6439 Duplicate fields at the same level sh... Backlog
Participants:

 Description   

I am testing MongoDB to find out if it would be a solution to automate a task.

To achieve this task, I have to import around 20,000 xml files in to MondoDB collection which I think it's really suite for MondodB.

I tested with mongo v3.2 and mongo compass to verify most possible queries what I need to use.

While I am testing after import 2 xml files after I converted to json format, found some of queries are returned with no values but some of them are OK.

Following is successful case;

> a.find ( {"ChangeHolderProfile.SequenceOrderPerMedia":1} ).pretty()
{
        "_id" : ObjectId("56d15af5c1b54590d5dc1f50"),
        "ActionList" : {
                "publishedDateID" : "20140104000021",
                "xsdVersion" : 2,
                "externalVersion" : 487,
                "managerRefNumber" : 2
        },
        "ReloadTPurse" : {
                "CardGenerationNumber" : 0,
                "CardSerialNumber" : NumberLong("1184365732964992"),
                "ExpirationDate" : "2014-03-04T00:00:00.000+13:00",
                "RefNumber" : 284617,
                "SequenceOrderPerMedia" : 4,
                "ValueToAdd" : 2975,
                "FeeValue" : 25,
                "FeeGSTValue" : 0
        },
        "ChangeHolderProfile" : {
                "CardGenerationNumber" : 0,
                "CardSerialNumber" : NumberLong("1201849732246144"),
                "ExpirationDate" : "2014-03-04T00:00:00.000+13:00",
                "RefNumber" : 284616,
                "SequenceOrderPerMedia" : 1,
                "HolderBirthDate" : "1996-03-08+13:00",
                "HolderProfileID" : 0,
                "HolderProfileValidityDate" : "1990-01-01+13:00"
        }
}

Following case has returned no values

 >  db.actionlist1.find( {"ReloadTPurse.CardSerialNumber":1208467063778432})
>

I have tested same case with Mongo compass but result are same.

All of data in this case is test data so I have attached a sample json file here so please advise me why I could not get any data when I query with CardSerialNumber ?



 Comments   
Comment by Kelsey Schubert [ 11/Mar/16 ]

Hi Chaedoo.jun@gmail.com,

SERVER-6439 describes modifying the behavior of MongoDB to not allow duplicate fields at the same level. I believe this ticket encompasses your concerns about how MongoDB handles insertion of data with repeated field names. Please watch SERVER-6439 for updates, and feel free to vote for it.

Additionally, I'd like to clarify that in your example you are not seeing b1 values written into the document because the shell helper is treating {b: {x: 1, y: 1}} as insert options.

Please see the two following examples using db.collection.insert():

> db.testcol.insert({a1: {x:1,y:1}, b1: {x:1,y:1} })
WriteResult({ "nInserted" : 1 })
> db.testcol.find()
{ "_id" : ObjectId("56e1c13aeaa50b47494c51ee"), "a1" : { "x" : 1, "y" : 1 }, "b1" : { "x" : 1, "y" : 1 } }
> 

This inserts a single document that contains two subdocuments a1 and b1.

> db.testcol.insert([{a1: {x:1,y:1} }, {b1: {x:1,y:1} }])
BulkWriteResult({
	"writeErrors" : [ ],
	"writeConcernErrors" : [ ],
	"nInserted" : 2,
	"nUpserted" : 0,
	"nMatched" : 0,
	"nModified" : 0,
	"nRemoved" : 0,
	"upserted" : [ ]
})
> db.testcol.find()
{ "_id" : ObjectId("56e1c1e5eaa50b47494c51ef"), "a1" : { "x" : 1, "y" : 1 } }
{ "_id" : ObjectId("56e1c1e5eaa50b47494c51f0"), "b1" : { "x" : 1, "y" : 1 } }
> 

This does a bulk insert and creates two documents: one has a subdocument a1, and the other has a subdocument b1.

For guidance about schema design in MongoDB, please consider posting on the mongodb-users group describing your use case.

Kind regards,
Thomas

Comment by Chaedoo Jun [X] [ 10/Mar/16 ]

It totally make sense for me example

{"x":1,"x":2}

case but ,please correct me if I am wrong,

when I tested as below , it still treated as same field name if I use dot notation like a.x and a.y.

> use test1
switched to db test1
> db.testcol.insert (
... {a: {x:1,y:1} },
... {b: {x:1,y:1} }
... )
WriteResult({ "nInserted" : 1 })
 
> db.testcol.find ()
{ "_id" : ObjectId("56e13bbc58e01890cb16681a"), "a" : { "x" : 1, "y" : 1 } }
 
> db.testcol.insert ( {a1: {x:1,y:1} }, {b1: {x:1,y:1} } )
WriteResult({ "nInserted" : 1 })
 
> db.testcol.find ()
{ "_id" : ObjectId("56e13bbc58e01890cb16681a"), "a" : { "x" : 1, "y" : 1 } }
{ "_id" : ObjectId("56e13c3f58e01890cb16681b"), "a1" : { "x" : 1, "y" : 1 } }
>

Even it did not write b1 values into document and no error or warning displayed ... ( I assume it should threw error message saying duplicate field name or similar way )

Comment by Chaedoo Jun [X] [ 10/Mar/16 ]

Thanks for your kind answer and now it makes sense for me.
Also I realized that it's not that simple task converting all of my documents in to MongoDB and current solution ( using grep to find card_id from stored xml files ) is way easier to achieve what I want. ( Because it is almost impossible to change all of field names to different name in nature of the file structures and number of files ).

It is disappointing I could not adapt MongoDB in to my first project.. but I will find another task to use MongoDB.

Comment by Kelsey Schubert [ 10/Mar/16 ]

Hi Chaedoo.jun@gmail.com,

Thank you for the clarifying information. I understand what is happening now, and can successfully reproduce it on my end. The behavior that you are observing is the result of having more than one field with the same name in a single document. As our documentation mentions while BSON documents may have more than one field with the same name, most MongoDB interfaces represent MongoDB with a structure (e.g. a hash table) that does not support duplicate field names.

The issue that you are observing is that if field names are set twice in the same document, mongo shell and Compass may display different values for this field. Furthermore, if the values displayed by the shell and Compass differ, only the values displayed by the shell are used to determine whether a document should be returned as part of a query.

As a simple example please consider the following json

{"x":1,"x":2}

And import it using mongoimport.

The mongo shell will display the following results

> db.collection.find({x:1})
{ "_id" : ObjectId("56e10a9491bee236c22e8fb6"), "x" : 1 }
> db.collection.find({x:2})
> 

However, in Compass this same document will display that x equals 2. As shown above, a query on this equality match (x = 2) will not return any results. This behavior explains the why you are seeing "No document found" when you query using the values in Compass for some documents.

To resolve this issue on your side, I would recommend modifying your file conversion method to ensure that there are no duplicate field names in your documents.

Thank you,
Thomas

Comment by Chaedoo Jun [X] [ 10/Mar/16 ]

Thanks for your support but I am sorry that I don't get what you said "expected behavior" .
I believe that it should return some of values but ,as it is shown in attached image, it returned no values and saying "No document found" .

I have tested with customized document which I uploaded for the first time, it actually returned vales as I expected but other files that I converted from xml to JSON, it returns no values even though it has imported with no error.

As attached image named " before_apply.jpg" there is a record with card_id but when I click apply, it returned no document as attached image "after_apply.jpg"
If it's expected behavior, it should be same at all time but when I used other file Mongo actually returned values.
It probably my converted file has issue but it should be cause problem while it imported in to Mongo DB.
While I imported files in to MongoDB, I didn't have any error or warning message I assume my files have no issues.

Would you please explain why it show different result ?

Comment by Kelsey Schubert [ 09/Mar/16 ]

Hi Chaedoo.jun@gmail.com,

Thank you for uploading these documents. Unfortunately, I have been unable reproduce the issue you describe where some documents cannot be found.

The image you have privately uploaded shows expected behavior. The document containing the subdocument with the matching CardSerialNumber is returned in its entirety including many fields not related to the card number. For more information, please review our documentation on querying an array of embedded documents.

From my investigation, I do not see anything to indicate a bug in the MongoDB server. For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

Comment by Chaedoo Jun [X] [ 06/Mar/16 ]

any progress ??

Comment by Chaedoo Jun [X] [ 29/Feb/16 ]

Also all of data what I sent you were from test system but please DO NOT SHARE or DISTRIBUTE any other places or materials.

Comment by Chaedoo Jun [X] [ 29/Feb/16 ]

Hi Thomas.

Thanks for your feedback.
I have uploaded the file I tested it.. It still show same result on my side.

By the way... I found wired result of query when I test with other imported file ( same kind of file I converted from other test platform). It gave me result of query included all other fields which are not matching with my query condition.

I will upload additional json files ( AC20120302113030.json, AC20120302113730.json).
Please try query db.collection.find (

{ChangeHolderProfile.CardSerialNumber:1199932279626880}

) then you will get many fields returned not related with the card number.
I would like to build a huge repository of these documents ( created daily and contained around 8,000 ~ 20,000 fields ) since year 2012 and it will keep growing further.
I need to find details mainly based on card serial numbers.

Comment by Kelsey Schubert [ 29/Feb/16 ]

Hi Chaedoo.jun@gmail.com,

I used mongoimport to import the sample json file, and was able to successfully locate it using the following query:

 db.foo.find({"ReloadTPurse.CardSerialNumber":2184365732962323}).pretty()
{
	"_id" : ObjectId("56d49cc79b09e2ef1841d636"),
	"ActionList" : {
		"publishedDateID" : "2014010100011",
		"xsdVersion" : 2,
		"externalVersion" : 487,
		"managerRefNumber" : 2
	},
	"ReloadTPurse" : {
		"CardGenerationNumber" : 0,
		"CardSerialNumber" : NumberLong("2184365732962323"),
		"ExpirationDate" : "2014-03-04T00:00:00.000+13:00",
		"RefNumber" : 284617,
		"SequenceOrderPerMedia" : 4,
		"ValueToAdd" : 2975,
		"FeeValue" : 25,
		"FeeGSTValue" : 0
	},
	"ChangeHolderProfile" : {
		"CardGenerationNumber" : 0,
		"CardSerialNumber" : NumberLong("2201849732244321"),
		"ExpirationDate" : "2014-03-04T00:00:00.000+13:00",
		"RefNumber" : 284616,
		"SequenceOrderPerMedia" : 1,
		"HolderBirthDate" : "1996-03-08+13:00",
		"HolderProfileID" : 0,
		"HolderProfileValidityDate" : "1990-01-01+13:00"
	}
}

So we can continue to investigate can you please provide your complete dataset? I have created a secure upload portal here. Files you upload will only be visible to MongoDB employees investigating this behavior.

Thank you,
Thomas

Comment by Chaedoo Jun [X] [ 28/Feb/16 ]

I just capture compass screen because it's easy to show my symptom.
I have tested with mongoshell, Robomongo and Mongo compass and all of results are same.

Generated at Thu Feb 08 04:01:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.