Transformer scheduler beats dispatching rules on 40x40 to 100x100 jobs

A Transformer trained on small job shops generalizes to industrial scale

Researchers trained an encoder-decoder Transformer with multi-head attention on open shop scheduling problem (OSSP) instances sized 4x4 to 10x10 jobs and machines. Input was only the processing-time matrix. The trained policy was then applied without retraining to randomly generated instances from 40x40 to 100x100.

On these large unseen instances, the Transformer achieved average gaps of 12.89–15.12% relative to a standard lower bound (per the arXiv paper). It remained competitive with the EST (Earliest Start Time) heuristic, substantially outperformed SPT (Shortest Processing Time) and LPT (Longest Processing Time), and produced feasible schedules on small training instances typically within 15–30% of best-known values.

The key result: a model trained on 10x10 problems scaled to 100x100 without architectural changes or retraining, suggesting that learned policies can capture scheduling principles general enough to apply across instance sizes.

Generalization across problem scales is rare in combinatorial optimization

Exact methods for OSSP become intractable beyond small instances. Classical dispatching rules (SPT, LPT, MWKR) are fast but inflexible and often require manual tuning per facility. Metaheuristics (simulated annealing, genetic algorithms) maintain quality at scale but demand substantial parameter tuning.

A learned policy that works on 40x40 to 100x100 problems without retraining sidesteps the tuning burden and offers a middle ground: faster inference than metaheuristics, no facility-specific tuning, and performance competitive with or better than hand-tuned rules. For manufacturing and logistics operations where scheduling happens daily and problem sizes vary, this could reduce engineering overhead.

The gap to optimality (12–15%) remains significant. The Transformer is not replacing exact solvers for small instances where exact methods are feasible, nor is it guaranteed to beat every tuned metaheuristic. It is a feature-light alternative that trades some solution quality for speed and generality.

Audit your dispatching rule baseline before deploying learned policies

If your shop uses static rules like SPT or LPT, compare them against the reported gaps on your own data. The Transformer outperformed those rules across the large instances tested, but EST was competitive. Measure your current rule performance as a baseline, then run the learned policy on held-out weeks of historical data. If your existing metaheuristic is already tuned for your facility, the marginal gain from switching may not justify integration overhead. If you rely on simple dispatch rules and scale variability makes tuning difficult, a learned policy trained on similar-sized benchmark instances offers a low-friction option.

Transformer scheduler beats dispatching rules on 40x40 to 100x100 jobs

Our Take

Why it matters

Do this week

A Transformer trained on small job shops generalizes to industrial scale

Generalization across problem scales is rare in combinatorial optimization

Audit your dispatching rule baseline before deploying learned policies

Related stories

MIT Lab Adds Mucosal Immunity to Injectable Polio Vaccine