- 
    Type:Task 
- 
    Resolution: Done
- 
    Priority:Critical - P2 
- 
    None
- 
    Affects Version/s: 2.3.0
- 
    Component/s: Writes
- 
    None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
I am connecting the mongodb database via pymongo and achieved the expected result of fetching it outside the db in json format . but my task is that i need to create a hive table via pyspark , I found that mongodb provided json (RF719) which spark is not supporting .when i tried to load the data in pyspark (dataframe) it is showing as corrupted record. Please suggest a repsonse
I am reading the data from pyspark using the below pyspark code
from pyspark import SparkContext, SparkConf,StorageLevel sc =SparkContext() from pyspark import HiveContext hiveContext = HiveContext(sc) from pyspark.sql import Row from pyspark.sql.functions import * df=hiveContext.read.option("multiline","true").json(sc.wholeTextFiles('file:/data06/XXXXXXXXX.json').values())
Please find the way it reads the data ------------------- | _corrupt_record| ------------------ |"[{\"finalization...| -------------------