Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.2.8
Component/s: Configuration, Reads
Labels:
None
Environment:
Storage : 1PB
Memory: 1.4 TB
Node : 16 nodes cluster
vcore CPU : 96

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Dear,

When we are trying to load data via spark connector, it is failing to load the collection's document completely.

Example : Employee Collection have 500 documents.

When we try to load this collection into a Data Frame using spark connectors, it is giving different load count.

Sometimes it loaded completely and sometime it misses few records.

Kindly suggest what am I doing wrong.

Below is the sample command :

spark-shell --master yarn --num-executors 10 --executor-memory 10g --executor-cores 8 --driver-memory 20g --jars /rtmstaging/BINS/CODEBASE/RAW_ZONE/SPARK/JAR/utils/mongo-spark-connector_2.11-2.3.3.jar,/rtmstaging/BINS/CODEBASE/RAW_ZONE/SPARK/JAR/utils/mongo-java-driver-3.12.2.jar

import spark.implicits._
val numLagDays =2;val numCurrentDays =1;

val yes_dt = spark.sql("select date_sub(current_date(),"numLagDays")").as[(String)].first+"T20:00:00Z"

val Tod_dt = spark.sql("select date_sub(current_date(),"numCurrentDays")").as[(String)].first+"T20:00:00Z"

val pipeline_cdt ="[{ $match: {$or:[{$and:[{'CreatedDate' : {$gte : ISODate('" + yes_dt +"')}},\{'CreatedDate' : {$lt : ISODate('" + Tod_dt +"')}}]},{$and:[{'UpdatedDate' : {$gte : ISODate('" + yes_dt + "')}},\{'UpdatedDate' : {$lt : ISODate('" + Tod_dt+"')}}]}]} } ]"

val dfr = spark.read.option("pipeline", pipeline_cdt)

val df = dfr.format("mongo").option("uri","mongodb://BIG_DATA_USER:Sad#12345@10.10.10.10:27017/Reports.employee?authSource=admin&readPreference=secondary&appname=MongoDB%20Compass&ssl=false&replicaSet=Reports").load()

df.show(false)

df.count() varies if we load into data frame, but while checking the MongoDB counts on MongoDB Compass/Robo3T it is showing the same.

Its an intermitted issue and some time we miss records and some time we have whole records.

Don't know what causes this issue.

Regards,

Sadique

Assignee:: Ross Lawley
Reporter:: Sadique Manzar
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Feb 02 2021 11:16:59 AM UTC
Updated:: Feb 05 2021 07:32:34 PM UTC
Resolved:: Feb 05 2021 11:13:24 AM UTC

Details

Description

Attachments

Forms

Activity

People

Dates