The currently proposed implementation creates a set of foreign key values per foreign document and then intersects it with a similar set created from a local document. While this is conceptually correct, we should investigate whether we could avoid creating the foreign set and instead probe for each foreign key value against the local set.
Because the set is per input document we are not too worried about the excessive memory usage, the main concern is CPU cycles. However, because the empty set of keys should be matched to null and it might require additional stages, probing can end up being as CPU-heavy as materializing the set.