2006.16668_GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding