The current documentation is not accurate with the aggregation framework's behavior.
When operating on a sharded collection, the aggregation pipeline is split into two parts. The aggregation framework pushes all of the operators up to and including the first $group or $sort to each shard. Then, a second pipeline on the mongos runs. This pipeline consists of the first $group or $sort and any remaining pipeline operators, and runs on the results received from the shards.
The mongos pipeline merges $sort operations from the shards. The $group operator brings in any “sub-totals” from the shards and combines them: in some cases these may be structures. For example, the $avg expression maintains a total and count for each shard; mongos combines these values and then divides.