-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Atlas Streams
-
Fully Compatible
-
Sprint 43, Sprint 44
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Currently, the sink in a streams pipeline will project the "_stream_meta" field into the document it writes to the sink. The __ "_stream_meta" field contains information about the source, and window.
For example:
{
...
_stream_meta: {
sourceType: 'atlas',
windowStartTimestamp: ISODate("2024-01-19T20:10:04.000Z"),
windowEndTimestamp: ISODate("2024-01-19T20:10:06.000Z")
}
}
However, users cannot use the "_stream_meta" field in expressions in their pipeline. For example the below $project would not work.
[
...
{
'$tumblingWindow': {
interval: { size: 2, unit: 'second' },
pipeline: [
{ '$group': { _id: null, count: {$sum: 1} } }
]
}
},
{
$project: {
_id: "$_stream_meta.windowStartTimestamp"
}
},
...
]
In this ticket we need to work with PMs to define the desired behavior for "_stream_meta", and write a brief technical doc describing how we will implement the behavior.
Regarding the behavior, we need to define things like:
Using the below pipeline as an example, should $project1, $match, $project3, and $project4 all be able to use the "_stream_meta" fields in the documents?
[
$source,
$project1
$tumblingWindow: {
pipeline: [
$match: { some expression on _stream_meta.windowStartTimestamp },
$group: {_id: null, count: {$sum: 1}}
$project3
]
},
$project4
]