Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 2.3.0
Component/s: Writes
Labels:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

I am connecting the mongodb database via pymongo and achieved the expected result of fetching it outside the db in json format . but my task is that i need to create a hive table via pyspark , I found that mongodb provided json (RF719) which spark is not supporting .when i tried to load the data in pyspark (dataframe) it is showing as corrupted record. Please suggest a repsonse

I am reading the data from pyspark using the below pyspark code

from pyspark import SparkContext, SparkConf,StorageLevel 
sc =SparkContext()
from pyspark import HiveContext
hiveContext = HiveContext(sc)
from pyspark.sql import Row
from pyspark.sql.functions import * 
df=hiveContext.read.option("multiline","true").json(sc.wholeTextFiles('file:/data06/XXXXXXXXX.json').values())

Please find the way it reads the data ------------------- | _corrupt_record| ------------------ |"[{\"finalization...| -------------------

Assignee:: Ross Lawley
Reporter:: rajaraman
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Sep 30 2018 05:51:35 AM UTC
Updated:: Sep 22 2021 06:46:36 PM UTC
Resolved:: Oct 01 2018 09:08:23 AM UTC

Details

Description

Attachments

Activity

People

Dates