Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.2.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Atlas Streams
Backwards Compatibility:
Fully Compatible
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Background

For background, the S3EmitOperator will write files to S3 with the following schema for the key:

<path>/<wall time><taskId><part number>.<extension

For example:

myStaticPath/1739652142669-1ac2-0000.json

The wall time has millisecond-level precision. Therefore, the part number is used to disambiguate between two files with the same wall time value. The part number is incremented whenever a sink writer sees that it's writing 2 files with the same wall clock time.

Suggested Implementation

We should make the sink writer hold onto a set/map data structure tracking filenames it has previously uploaded.

When generating an S3 object key, we should check within the map/set to see if it's previously used this name. If so, we should increment the part number and repeat the check. Once we find a name that has not previously been uploaded, we can write that name in the map and then use this name for the upload.

Assignee:: Andrew Chen
Reporter:: Andrew Chen
Participants:: Andrew Chen
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Feb 28 2025 02:52:30 AM UTC
Updated:: Apr 08 2025 03:05:09 PM UTC
Resolved:: Apr 08 2025 03:05:06 PM UTC

Details

Description

Background

Suggested Implementation

Attachments

Forms

Activity

People

Dates