We investigate scaling language models in data-constrained regimes. We run a large set of experiments varying the extent of data repetition and compute budget, ranging up to 900 billion training ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results