bash-4.2# rm -rf prd_data_to_validate.csv bash-4.2# python3 num_conversion.py --secret=customer-dp-prd-mongodb --srv_connect=yes --exclude_prev_documents=no WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/lib/python3.7/site-packages/pyspark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release :: loading settings :: url = jar:file:/usr/local/lib/python3.7/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars org.mongodb.spark#mongo-spark-connector added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-99de52ef-f635-4b89-9ea2-9f1fac9c6bd4;1.0 confs: [default] found org.mongodb.spark#mongo-spark-connector;10.0.0 in central found org.mongodb#mongodb-driver-sync;4.5.1 in central [4.5.1] org.mongodb#mongodb-driver-sync;[4.5.0,4.5.99) found org.mongodb#bson;4.5.1 in central found org.mongodb#mongodb-driver-core;4.5.1 in central :: resolution report :: resolve 1925ms :: artifacts dl 7ms :: modules in use: org.mongodb#bson;4.5.1 from central in [default] org.mongodb#mongodb-driver-core;4.5.1 from central in [default] org.mongodb#mongodb-driver-sync;4.5.1 from central in [default] org.mongodb.spark#mongo-spark-connector;10.0.0 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 4 | 1 | 0 | 0 || 4 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-99de52ef-f635-4b89-9ea2-9f1fac9c6bd4 confs: [default] 0 artifacts copied, 4 already retrieved (0kB/8ms) 22/04/13 17:57:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 22/04/13 17:57:42 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 264.0 B, free 434.4 MiB) 22/04/13 17:57:42 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 466.0 B, free 434.4 MiB) 22/04/13 17:57:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 5bcbeccf03ef:41245 (size: 466.0 B, free: 434.4 MiB) 22/04/13 17:57:42 INFO SparkContext: Created broadcast 0 from broadcast at MongoSpark.scala:530 22/04/13 17:57:42 INFO cluster: Cluster created with settings {hosts=[127.0.0.1:27017], srvHost=mongodb-prd.dp.aws.customer.internal, mode=MULTIPLE, requiredClusterType=REPLICA_SET, serverSelectionTimeout='30000 ms', requiredReplicaSetName='prd'} 22/04/13 17:57:42 INFO MongoClientCache: Creating MongoClient: [] 22/04/13 17:57:42 INFO cluster: Adding discovered server rs1-prd.dp.aws.customer.internal:27017 to client view of cluster 22/04/13 17:57:42 INFO cluster: Cluster description not yet available. Waiting for 30000 ms before timing out 22/04/13 17:57:42 INFO cluster: Adding discovered server rs2-prd.dp.aws.customer.internal:27017 to client view of cluster 22/04/13 17:57:42 INFO cluster: No server chosen by com.mongodb.client.internal.MongoClientDelegate$1@64823a0e from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=rs1-prd.dp.aws.customer.internal:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=rs2-prd.dp.aws.customer.internal:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out 22/04/13 17:57:42 INFO connection: Opened connection [connectionId{localValue:1, serverValue:196742}] to rs2-prd.dp.aws.customer.internal:27017 22/04/13 17:57:42 INFO connection: Opened connection [connectionId{localValue:2, serverValue:199570}] to rs1-prd.dp.aws.customer.internal:27017 22/04/13 17:57:42 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=rs1-prd.dp.aws.customer.internal:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=13, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=4316593, setName='prd', canonicalAddress=ip-172-19-33-146.eu-central-1.compute.internal:27017, hosts=[ip-172-19-33-146.eu-central-1.compute.internal:27017, ip-172-19-46-161.eu-central-1.compute.internal:27017], passives=[], arbiters=[], primary='ip-172-19-33-146.eu-central-1.compute.internal:27017', tagSet=TagSet{[]}, electionId=7fffffff000000000000000f, setVersion=9, lastWriteDate=Wed Apr 13 17:57:33 UTC 2022, lastUpdateTimeNanos=2870364251456430} 22/04/13 17:57:42 INFO cluster: Adding discovered server ip-172-19-33-146.eu-central-1.compute.internal:27017 to client view of cluster 22/04/13 17:57:42 INFO cluster: Adding discovered server ip-172-19-46-161.eu-central-1.compute.internal:27017 to client view of cluster 22/04/13 17:57:42 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=rs2-prd.dp.aws.customer.internal:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=13, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=19717879, setName='prd', canonicalAddress=ip-172-19-46-161.eu-central-1.compute.internal:27017, hosts=[ip-172-19-33-146.eu-central-1.compute.internal:27017, ip-172-19-46-161.eu-central-1.compute.internal:27017], passives=[], arbiters=[], primary='ip-172-19-33-146.eu-central-1.compute.internal:27017', tagSet=TagSet{[]}, electionId=null, setVersion=9, lastWriteDate=Wed Apr 13 17:57:33 UTC 2022, lastUpdateTimeNanos=2870364266313024} 22/04/13 17:57:42 INFO cluster: Server rs1-prd.dp.aws.customer.internal:27017 is no longer a member of the replica set. Removing from client view of cluster. 22/04/13 17:57:42 INFO connection: Opened connection [connectionId{localValue:3, serverValue:199571}] to ip-172-19-33-146.eu-central-1.compute.internal:27017 22/04/13 17:57:42 INFO cluster: Server rs2-prd.dp.aws.customer.internal:27017 is no longer a member of the replica set. Removing from client view of cluster. 22/04/13 17:57:42 INFO cluster: Canonical address ip-172-19-33-146.eu-central-1.compute.internal:27017 does not match server address. Removing rs1-prd.dp.aws.customer.internal:27017 from client view of cluster 22/04/13 17:57:42 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=ip-172-19-33-146.eu-central-1.compute.internal:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=13, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=2342756, setName='prd', canonicalAddress=ip-172-19-33-146.eu-central-1.compute.internal:27017, hosts=[ip-172-19-33-146.eu-central-1.compute.internal:27017, ip-172-19-46-161.eu-central-1.compute.internal:27017], passives=[], arbiters=[], primary='ip-172-19-33-146.eu-central-1.compute.internal:27017', tagSet=TagSet{[]}, electionId=7fffffff000000000000000f, setVersion=9, lastWriteDate=Wed Apr 13 17:57:33 UTC 2022, lastUpdateTimeNanos=2870364279574824} 22/04/13 17:57:42 INFO cluster: Setting max election id to 7fffffff000000000000000f from replica set primary ip-172-19-33-146.eu-central-1.compute.internal:27017 22/04/13 17:57:42 INFO cluster: Setting max set version to 9 from replica set primary ip-172-19-33-146.eu-central-1.compute.internal:27017 22/04/13 17:57:42 INFO cluster: Discovered replica set primary ip-172-19-33-146.eu-central-1.compute.internal:27017 22/04/13 17:57:42 INFO connection: Opened connection [connectionId{localValue:4, serverValue:196743}] to ip-172-19-46-161.eu-central-1.compute.internal:27017 22/04/13 17:57:42 INFO cluster: Monitor thread successfully connected to server with description ServerDescription{address=ip-172-19-46-161.eu-central-1.compute.internal:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=13, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1853749, setName='prd', canonicalAddress=ip-172-19-46-161.eu-central-1.compute.internal:27017, hosts=[ip-172-19-33-146.eu-central-1.compute.internal:27017, ip-172-19-46-161.eu-central-1.compute.internal:27017], passives=[], arbiters=[], primary='ip-172-19-33-146.eu-central-1.compute.internal:27017', tagSet=TagSet{[]}, electionId=null, setVersion=9, lastWriteDate=Wed Apr 13 17:57:33 UTC 2022, lastUpdateTimeNanos=2870364293710798} 22/04/13 17:57:42 INFO connection: Opened connection [connectionId{localValue:5, serverValue:199572}] to ip-172-19-33-146.eu-central-1.compute.internal:27017 22/04/13 17:57:43 INFO SparkContext: Starting job: treeAggregate at MongoInferSchema.scala:88 22/04/13 17:57:43 INFO DAGScheduler: Got job 0 (treeAggregate at MongoInferSchema.scala:88) with 1 output partitions 22/04/13 17:57:43 INFO DAGScheduler: Final stage: ResultStage 0 (treeAggregate at MongoInferSchema.scala:88) 22/04/13 17:57:43 INFO DAGScheduler: Parents of final stage: List() 22/04/13 17:57:43 INFO DAGScheduler: Missing parents: List() 22/04/13 17:57:43 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at treeAggregate at MongoInferSchema.scala:88), which has no missing parents 22/04/13 17:57:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 7.3 KiB, free 434.4 MiB) 22/04/13 17:57:43 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.6 KiB, free 434.4 MiB) 22/04/13 17:57:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 5bcbeccf03ef:41245 (size: 3.6 KiB, free: 434.4 MiB) 22/04/13 17:57:43 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1388 22/04/13 17:57:43 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at treeAggregate at MongoInferSchema.scala:88) (first 15 tasks are for partitions Vector(0)) 22/04/13 17:57:43 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0 22/04/13 17:57:43 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (5bcbeccf03ef, executor driver, partition 0, ANY, 4610 bytes) taskResourceAssignments Map() 22/04/13 17:57:43 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 22/04/13 17:57:46 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 7801 bytes result sent to driver 22/04/13 17:57:46 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3057 ms on 5bcbeccf03ef (executor driver) (1/1) 22/04/13 17:57:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 22/04/13 17:57:46 INFO DAGScheduler: ResultStage 0 (treeAggregate at MongoInferSchema.scala:88) finished in 3.174 s 22/04/13 17:57:46 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job 22/04/13 17:57:46 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished 22/04/13 17:57:46 INFO DAGScheduler: Job 0 finished: treeAggregate at MongoInferSchema.scala:88, took 3.227706 s WARNING:root:Using filer clause: CDC_TIMESTAMP >= "1649841014867" 22/04/13 17:57:48 INFO MongoRelation: requiredColumns: CDC_TIMESTAMP, filters: IsNotNull(CDC_TIMESTAMP), GreaterThanOrEqual(CDC_TIMESTAMP,1649841014867) 22/04/13 17:57:48 INFO CodeGenerator: Code generated in 186.699188 ms 22/04/13 17:57:48 INFO CodeGenerator: Code generated in 20.044051 ms 22/04/13 17:57:48 INFO SparkContext: Starting job: count at NativeMethodAccessorImpl.java:0 22/04/13 17:57:48 INFO DAGScheduler: Registering RDD 9 (count at NativeMethodAccessorImpl.java:0) as input to shuffle 0 22/04/13 17:57:48 INFO DAGScheduler: Got job 1 (count at NativeMethodAccessorImpl.java:0) with 1 output partitions 22/04/13 17:57:48 INFO DAGScheduler: Final stage: ResultStage 2 (count at NativeMethodAccessorImpl.java:0) 22/04/13 17:57:48 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1) 22/04/13 17:57:48 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 1) 22/04/13 17:57:48 INFO DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[9] at count at NativeMethodAccessorImpl.java:0), which has no missing parents 22/04/13 17:57:48 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 24.2 KiB, free 434.4 MiB) 22/04/13 17:57:48 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 9.7 KiB, free 434.4 MiB) 22/04/13 17:57:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 5bcbeccf03ef:41245 (size: 9.7 KiB, free: 434.4 MiB) 22/04/13 17:57:48 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1388 22/04/13 17:57:48 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[9] at count at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) 22/04/13 17:57:48 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks resource profile 0 22/04/13 17:57:48 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) (5bcbeccf03ef, executor driver, partition 0, ANY, 4599 bytes) taskResourceAssignments Map() 22/04/13 17:57:48 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) 22/04/13 17:57:49 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 5bcbeccf03ef:41245 in memory (size: 3.6 KiB, free: 434.4 MiB) 22/04/13 17:57:49 INFO CodeGenerator: Code generated in 88.180872 ms 22/04/13 17:57:49 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1984 bytes result sent to driver 22/04/13 17:57:49 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 316 ms on 5bcbeccf03ef (executor driver) (1/1) 22/04/13 17:57:49 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 22/04/13 17:57:49 INFO DAGScheduler: ShuffleMapStage 1 (count at NativeMethodAccessorImpl.java:0) finished in 0.345 s 22/04/13 17:57:49 INFO DAGScheduler: looking for newly runnable stages 22/04/13 17:57:49 INFO DAGScheduler: running: Set() 22/04/13 17:57:49 INFO DAGScheduler: waiting: Set(ResultStage 2) 22/04/13 17:57:49 INFO DAGScheduler: failed: Set() 22/04/13 17:57:49 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[12] at count at NativeMethodAccessorImpl.java:0), which has no missing parents 22/04/13 17:57:49 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 10.1 KiB, free 434.4 MiB) 22/04/13 17:57:49 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 5.0 KiB, free 434.4 MiB) 22/04/13 17:57:49 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 5bcbeccf03ef:41245 (size: 5.0 KiB, free: 434.4 MiB) 22/04/13 17:57:49 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1388 22/04/13 17:57:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[12] at count at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) 22/04/13 17:57:49 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks resource profile 0 22/04/13 17:57:49 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2) (5bcbeccf03ef, executor driver, partition 0, NODE_LOCAL, 4453 bytes) taskResourceAssignments Map() 22/04/13 17:57:49 INFO Executor: Running task 0.0 in stage 2.0 (TID 2) 22/04/13 17:57:49 INFO ShuffleBlockFetcherIterator: Getting 1 (60.0 B) non-empty blocks including 1 (60.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) remote blocks 22/04/13 17:57:49 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 8 ms 22/04/13 17:57:49 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 2648 bytes result sent to driver 22/04/13 17:57:49 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 64 ms on 5bcbeccf03ef (executor driver) (1/1) 22/04/13 17:57:49 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 22/04/13 17:57:49 INFO DAGScheduler: ResultStage 2 (count at NativeMethodAccessorImpl.java:0) finished in 0.076 s 22/04/13 17:57:49 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job 22/04/13 17:57:49 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished 22/04/13 17:57:49 INFO DAGScheduler: Job 1 finished: count at NativeMethodAccessorImpl.java:0, took 0.476965 s WARNING:root:The data dataframe count after filter on timestamp: 646 22/04/13 17:57:49 INFO MongoRelation: requiredColumns: CDC_TIMESTAMP, ENV, PK_ID, PROFILE, VAL0000, VAL0015, VAL0030, VAL0045, VAL0100, VAL0115, VAL0130, VAL0145, VAL0200, VAL0215, VAL0230, VAL0245, VAL0300, VAL0315, VAL0330, VAL0345, VAL0400, VAL0415, VAL0430, VAL0445, VAL0500, VAL0515, VAL0530, VAL0545, VAL0600, VAL0615, VAL0630, VAL0645, VAL0700, VAL0715, VAL0730, VAL0745, VAL0800, VAL0815, VAL0830, VAL0845, VAL0900, VAL0915, VAL0930, VAL0945, VAL1000, VAL1015, VAL1030, VAL1045, VAL1100, VAL1115, VAL1130, VAL1145, VAL1200, VAL1215, VAL1230, VAL1245, VAL1300, VAL1315, VAL1330, VAL1345, VAL1400, VAL1415, VAL1430, VAL1445, VAL1500, VAL1515, VAL1530, VAL1545, VAL1600, VAL1615, VAL1630, VAL1645, VAL1700, VAL1715, VAL1730, VAL1745, VAL1800, VAL1815, VAL1830, VAL1845, VAL1900, VAL1915, VAL1930, VAL1945, VAL2000, VAL2015, VAL2030, VAL2045, VAL2100, VAL2115, VAL2130, VAL2145, VAL2200, VAL2215, VAL2230, VAL2245, VAL2300, VAL2315, VAL2330, VAL2345, VALUEDAY, _insertedTS, _modifiedTS, filters: IsNotNull(CDC_TIMESTAMP), GreaterThanOrEqual(CDC_TIMESTAMP,1649841014867) 22/04/13 17:57:49 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'. 22/04/13 17:57:49 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 22/04/13 17:57:49 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 22/04/13 17:57:49 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 22/04/13 17:57:49 INFO SparkContext: Starting job: csv at NativeMethodAccessorImpl.java:0 22/04/13 17:57:49 INFO DAGScheduler: Got job 2 (csv at NativeMethodAccessorImpl.java:0) with 1 output partitions 22/04/13 17:57:49 INFO DAGScheduler: Final stage: ResultStage 3 (csv at NativeMethodAccessorImpl.java:0) 22/04/13 17:57:49 INFO DAGScheduler: Parents of final stage: List() 22/04/13 17:57:49 INFO DAGScheduler: Missing parents: List() 22/04/13 17:57:49 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[17] at csv at NativeMethodAccessorImpl.java:0), which has no missing parents 22/04/13 17:57:49 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 295.9 KiB, free 434.1 MiB) 22/04/13 17:57:49 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 81.8 KiB, free 434.0 MiB) 22/04/13 17:57:49 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 5bcbeccf03ef:41245 (size: 81.8 KiB, free: 434.3 MiB) 22/04/13 17:57:49 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1388 22/04/13 17:57:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[17] at csv at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) 22/04/13 17:57:49 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks resource profile 0 22/04/13 17:57:49 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3) (5bcbeccf03ef, executor driver, partition 0, ANY, 4610 bytes) taskResourceAssignments Map() 22/04/13 17:57:49 INFO Executor: Running task 0.0 in stage 3.0 (TID 3) 22/04/13 17:57:50 INFO CodeGenerator: Code generated in 100.862642 ms 22/04/13 17:57:50 INFO CodeGenerator: Code generated in 7.497137 ms 22/04/13 17:57:50 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 22/04/13 17:57:50 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 22/04/13 17:57:50 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 22/04/13 17:57:50 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 5bcbeccf03ef:41245 in memory (size: 9.7 KiB, free: 434.3 MiB) 22/04/13 17:57:50 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 5bcbeccf03ef:41245 in memory (size: 5.0 KiB, free: 434.3 MiB) 22/04/13 17:57:50 INFO CodeGenerator: Code generated in 378.757626 ms 22/04/13 17:57:51 INFO FileOutputCommitter: Saved output of task 'attempt_20220413175749814450369926536383_0003_m_000000_3' to file:/opt/prd_data_to_validate.csv/_temporary/0/task_20220413175749814450369926536383_0003_m_000000 22/04/13 17:57:51 INFO SparkHadoopMapRedUtil: attempt_20220413175749814450369926536383_0003_m_000000_3: Committed 22/04/13 17:57:51 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 2429 bytes result sent to driver 22/04/13 17:57:51 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 1504 ms on 5bcbeccf03ef (executor driver) (1/1) 22/04/13 17:57:51 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 22/04/13 17:57:51 INFO DAGScheduler: ResultStage 3 (csv at NativeMethodAccessorImpl.java:0) finished in 1.558 s 22/04/13 17:57:51 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job 22/04/13 17:57:51 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished 22/04/13 17:57:51 INFO DAGScheduler: Job 2 finished: csv at NativeMethodAccessorImpl.java:0, took 1.566253 s 22/04/13 17:57:51 INFO FileFormatWriter: Write Job c060f432-361e-4724-b317-2c103b34be8b committed. 22/04/13 17:57:51 INFO FileFormatWriter: Finished processing stats for write job c060f432-361e-4724-b317-2c103b34be8b. WARNING:root:Done WARNING:root:Using filer clause: PK_ID == "04000000000000000000120220402" 22/04/13 17:57:51 INFO MongoRelation: requiredColumns: CDC_TIMESTAMP, ENV, PK_ID, PROFILE, VAL0000, VAL0015, VAL0030, VAL0045, VAL0100, VAL0115, VAL0130, VAL0145, VAL0200, VAL0215, VAL0230, VAL0245, VAL0300, VAL0315, VAL0330, VAL0345, VAL0400, VAL0415, VAL0430, VAL0445, VAL0500, VAL0515, VAL0530, VAL0545, VAL0600, VAL0615, VAL0630, VAL0645, VAL0700, VAL0715, VAL0730, VAL0745, VAL0800, VAL0815, VAL0830, VAL0845, VAL0900, VAL0915, VAL0930, VAL0945, VAL1000, VAL1015, VAL1030, VAL1045, VAL1100, VAL1115, VAL1130, VAL1145, VAL1200, VAL1215, VAL1230, VAL1245, VAL1300, VAL1315, VAL1330, VAL1345, VAL1400, VAL1415, VAL1430, VAL1445, VAL1500, VAL1515, VAL1530, VAL1545, VAL1600, VAL1615, VAL1630, VAL1645, VAL1700, VAL1715, VAL1730, VAL1745, VAL1800, VAL1815, VAL1830, VAL1845, VAL1900, VAL1915, VAL1930, VAL1945, VAL2000, VAL2015, VAL2030, VAL2045, VAL2100, VAL2115, VAL2130, VAL2145, VAL2200, VAL2215, VAL2230, VAL2245, VAL2300, VAL2315, VAL2330, VAL2345, VALUEDAY, _id, _insertedTS, _modifiedTS, filters: IsNotNull(PK_ID), EqualTo(PK_ID,04000000000000000000120220402) 22/04/13 17:57:51 INFO SparkContext: Starting job: toPandas at num_conversion.py:96 22/04/13 17:57:51 INFO DAGScheduler: Got job 3 (toPandas at num_conversion.py:96) with 1 output partitions 22/04/13 17:57:51 INFO DAGScheduler: Final stage: ResultStage 4 (toPandas at num_conversion.py:96) 22/04/13 17:57:51 INFO DAGScheduler: Parents of final stage: List() 22/04/13 17:57:51 INFO DAGScheduler: Missing parents: List() 22/04/13 17:57:51 INFO DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[25] at toPandas at num_conversion.py:96), which has no missing parents 22/04/13 17:57:51 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 129.2 KiB, free 433.9 MiB) 22/04/13 17:57:51 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 23.1 KiB, free 433.9 MiB) 22/04/13 17:57:51 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 5bcbeccf03ef:41245 (size: 23.1 KiB, free: 434.3 MiB) 22/04/13 17:57:51 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1388 22/04/13 17:57:51 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[25] at toPandas at num_conversion.py:96) (first 15 tasks are for partitions Vector(0)) 22/04/13 17:57:51 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks resource profile 0 22/04/13 17:57:51 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 4) (5bcbeccf03ef, executor driver, partition 0, ANY, 4610 bytes) taskResourceAssignments Map() 22/04/13 17:57:51 INFO Executor: Running task 0.0 in stage 4.0 (TID 4) 22/04/13 17:57:51 INFO CodeGenerator: Code generated in 32.669688 ms 22/04/13 17:57:51 INFO CodeGenerator: Code generated in 4.909875 ms 22/04/13 17:57:52 INFO CodeGenerator: Code generated in 207.215637 ms 22/04/13 17:57:52 INFO Executor: Finished task 0.0 in stage 4.0 (TID 4). 2242 bytes result sent to driver 22/04/13 17:57:52 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 4) in 427 ms on 5bcbeccf03ef (executor driver) (1/1) 22/04/13 17:57:52 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 22/04/13 17:57:52 INFO DAGScheduler: ResultStage 4 (toPandas at num_conversion.py:96) finished in 0.444 s 22/04/13 17:57:52 INFO DAGScheduler: Job 3 is finished. Cancelling potential speculative or zombie tasks for this job 22/04/13 17:57:52 INFO TaskSchedulerImpl: Killing all running tasks in stage 4: Stage finished 22/04/13 17:57:52 INFO DAGScheduler: Job 3 finished: toPandas at num_conversion.py:96, took 0.452463 s /usr/local/lib/python3.7/site-packages/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[column_name] = series +----+-----------------+---------+-------------------------------+--------------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+-------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+--------------------------------------------------------------+----------------------------+----------------------------+ | | CDC_TIMESTAMP | ENV | PK_ID | PROFILE | VAL0000 | VAL0015 | VAL0030 | VAL0045 | VAL0100 | VAL0115 | VAL0130 | VAL0145 | VAL0200 | VAL0215 | VAL0230 | VAL0245 | VAL0300 | VAL0315 | VAL0330 | VAL0345 | VAL0400 | VAL0415 | VAL0430 | VAL0445 | VAL0500 | VAL0515 | VAL0530 | VAL0545 | VAL0600 | VAL0615 | VAL0630 | VAL0645 | VAL0700 | VAL0715 | VAL0730 | VAL0745 | VAL0800 | VAL0815 | VAL0830 | VAL0845 | VAL0900 | VAL0915 | VAL0930 | VAL0945 | VAL1000 | VAL1015 | VAL1030 | VAL1045 | VAL1100 | VAL1115 | VAL1130 | VAL1145 | VAL1200 | VAL1215 | VAL1230 | VAL1245 | VAL1300 | VAL1315 | VAL1330 | VAL1345 | VAL1400 | VAL1415 | VAL1430 | VAL1445 | VAL1500 | VAL1515 | VAL1530 | VAL1545 | VAL1600 | VAL1615 | VAL1630 | VAL1645 | VAL1700 | VAL1715 | VAL1730 | VAL1745 | VAL1800 | VAL1815 | VAL1830 | VAL1845 | VAL1900 | VAL1915 | VAL1930 | VAL1945 | VAL2000 | VAL2015 | VAL2030 | VAL2045 | VAL2100 | VAL2115 | VAL2130 | VAL2145 | VAL2200 | VAL2215 | VAL2230 | VAL2245 | VAL2300 | VAL2315 | VAL2330 | VAL2345 | VALUEDAY | _id | _insertedTS | _modifiedTS | |----+-----------------+---------+-------------------------------+--------------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+-------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+--------------------------------------------------------------+----------------------------+----------------------------| | 0 | 1649841751350 | 040 | 04000000000000000000120220402 | 000000000000000001 | 3.5078e+11 | 3.49329e+11 | 3.51114e+11 | 3.50122e+11 | 3.51123e+11 | 3.49107e+11 | 3.49487e+11 | 3.4961e+11 | 3.52268e+11 | 3.5352e+11 | 3.53018e+11 | 3.57894e+11 | 3.56972e+11 | 3.62444e+11 | 3.63301e+11 | 3.69719e+11 | 3.52913e+11 | 3.51929e+11 | 3.51072e+11 | 3.65405e+11 | 3.75505e+11 | 3.71676e+11 | 3.74904e+11 | 3.80856e+11 | 3.85648e+11 | 3.95756e+11 | 4.01078e+11 | 4.07666e+11 | 4.09613e+11 | 4.10564e+11 | 4.12383e+11 | 4.13976e+11 | 4.12317e+11 | 4.08353e+11 | 4.02662e+11 | 3.98999e+11 | 3.98751e+11 | 3.98485e+11 | 3.98365e+11 | 4.15694e+11 | 4.14563e+11 | 4.03958e+11 | 3.94607e+11 | 3.89382e+11 | 3.85125e+11 | 3.78082e+11 | 3.70256e+11 | 3.59053e+11 | 3.56649e+11 | 3.52028e+11 | 3.37124e+11 | 3.35998e+11 | 3.36556e+11 | 3.44414e+11 | 3.47578e+11 | 3.53388e+11 | 3.55992e+11 | 3.71107e+11 | 3.74508e+11 | 3.92785e+11 | 3.90448e+11 | 3.94387e+11 | 3.95385e+11 | 4.09742e+11 | 4.22795e+11 | 4.29898e+11 | 4.38859e+11 | 4.43387e+11 | 4.4764e+11 | 4.54355e+11 | 4.59675e+11 | 4.66461e+11 | 4.66546e+11 | 4.76763e+11 | 4.82065e+11 | 4.80737e+11 | 4.73521e+11 | 4.71775e+11 | 4.65039e+11 | 4.64097e+11 | 4.88754e+11 | 4.95853e+11 | 4.84439e+11 | 4.77569e+11 | 4.66107e+11 | 4.54804e+11 | 4.46853e+11 | 4.40701e+11 | 4.30362e+11 | 4.2785e+11 | 4.18933e+11 | 4.09388e+11 | 4.01405e+11 | 3.89836e+11 | 3.82055e+11 | 3.73036e+11 | 20220402 | Row(PK_ID='04000000000000000000120220402', oid=None) | 2022-04-04 06:16:09.979000 | 2022-04-13 07:22:34.990000 | +----+-----------------+---------+-------------------------------+--------------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+-------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+-------------+-------------+-------------+-------------+-------------+-------------+------------+--------------------------------------------------------------+----------------------------+----------------------------+ 22/04/13 17:57:52 INFO MongoClientCache: Closing MongoClient: [ip-172-19-46-161.eu-central-1.compute.internal:27017,ip-172-19-33-146.eu-central-1.compute.internal:27017] 22/04/13 17:57:52 INFO SparkContext: Invoking stop() from shutdown hook 22/04/13 17:57:52 INFO connection: Closed connection [connectionId{localValue:5, serverValue:199572}] to ip-172-19-33-146.eu-central-1.compute.internal:27017 because the pool has been closed. 22/04/13 17:57:52 INFO SparkUI: Stopped Spark web UI at http://5bcbeccf03ef:4040 bash-4.2# 22/04/13 17:57:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 22/04/13 17:57:52 INFO MemoryStore: MemoryStore cleared 22/04/13 17:57:52 INFO BlockManager: BlockManager stopped 22/04/13 17:57:52 INFO BlockManagerMaster: BlockManagerMaster stopped 22/04/13 17:57:52 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 22/04/13 17:57:52 INFO SparkContext: Successfully stopped SparkContext 22/04/13 17:57:52 INFO ShutdownHookManager: Shutdown hook called 22/04/13 17:57:52 INFO ShutdownHookManager: Deleting directory /tmp/spark-238b973a-62ae-41fd-8bbe-e53d6b7d3cf9 22/04/13 17:57:52 INFO ShutdownHookManager: Deleting directory /tmp/spark-238b973a-62ae-41fd-8bbe-e53d6b7d3cf9/pyspark-25d124f8-2be7-454c-a8ab-24a0f9887218 22/04/13 17:57:52 INFO ShutdownHookManager: Deleting directory /tmp/spark-e036480c-8614-403a-a43f-fd5b90bb0389 ^C bash-4.2#