Audience
Developers and users interested in a solution to generate, summarize, and autocomplete code
About CodeT5
Code for CodeT5, a new code-aware pre-trained encoder-decoder model. Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. This is the official PyTorch implementation for the EMNLP 2021 paper from Salesforce Research. CodeT5-large-ntp-py is specially optimized for Python code generation tasks and employed as the foundation model for our CodeRL, yielding new SOTA results on the APPS Python competition-level program synthesis benchmark. This repo provides the code for reproducing the experiments in CodeT5. CodeT5 is a new pre-trained encoder-decoder model for programming languages, which is pre-trained on 8.35M functions in 8 programming languages (Python, Java, JavaScript, PHP, Ruby, Go, C, and C#). In total, it achieves state-of-the-art results on 14 sub-tasks in a code intelligence benchmark - CodeXGLUE. Generate code based on the natural language description.
