Abstract: Modern language models (LMs) increasingly require two critical resources: computational resources and data resources. Data selection techniques can effectively reduce the amount of training ...
Abstract: Scripting languages like Python or JavaScript are extremely popular among developers, in part due to their massive open-source ecosystems that enable smooth code reuse. However, recent work ...