Input collection:
{"a" : 123, "b" : "abc", "c" : "xxx"} {"a" : 111, "b" : "aaa", "c" : "yyy"} {"a" : null, "b" : null, "c" : "zzz"}After loading this collection (with or without providing a schema) the value of column "b" for the third row is the string "null" instead of null.
You can see it in various ways:
- Applying collectAsList() and watching the content
- testDataset.filter(col("b").isNull()).show() - prints an empty dataset.
- Save the dataset to another collection
Note that testDataset.filter(col("b").isNotNull()) returns a correct result.
This problem does NOT occur when the column is numeric like in column "a".
I debugged the code and found that in MapFunctions, function convertToDataType returns "null" instead of null when the column is of a String type and the element is BsonNull.