Reproduced in MongoDB 3.4.16 and 4.0.3.
With a considerable number of chunks (1+ million), the balancer is observed to spend a large amount of time checking each chunk for belonging to a tag. This can lead to a situation where a balancer round spends most of its time finding a candidate chunk (e.g. one minute) rather than migrating a chunk. This can have a significant impact on the overall cluster balancing performance.
Below is the a repro where the balancer spends 90% of its time finding a candidate chunk, and only 10% of its time moving the chunk.
Off-CPU profiling suggests that the balancer thread is CPU-bound. Attached a 60-second flame graph of the 3.4.16 CSRS primary process. The CSRS primary is only balancing the cluster at that time.
Most CPU time is consumed in BSONObj:woCompare().