ESMC
ESMC is the latest in the ESM family of protein language models, establishing a new frontier in representation learning for protein biology. Trained on billions of evolutionary sequences, it learns representations that reflect a mechanistic reduction of protein structure and function. The model is built on a transformer architecture, supports sequences as its core modality, and is trained on up to 6 billion proteins. ESMC is designed for protein science research, including structure prediction, function annotation, protein design, and understanding evolutionary relationships between proteins. It can generate novel proteins from partial sequence, structure, or functional constraints, helping researchers explore new possibilities in protein design and biological discovery. The Biohub Platform provides access to ESMC through the API and the ESM Python package, with quickstart resources for installing the package, creating an API key, connecting to the platform.
Learn more
Evo 2
Evo 2 is a genomic foundation model capable of generalist prediction and design tasks across DNA, RNA, and proteins. It utilizes a frontier deep learning architecture to model biological sequences at single-nucleotide resolution, achieving near-linear scaling of compute and memory relative to context length. Trained with 40 billion parameters and a 1 megabase context length, Evo 2 processes over 9 trillion nucleotides from diverse eukaryotic and prokaryotic genomes. This extensive training enables Evo 2 to perform zero-shot function prediction across multiple biological modalities, including DNA, RNA, and proteins, and to generate novel sequences with plausible genomic architecture. The model's capabilities have been demonstrated in tasks such as designing functional CRISPR systems and predicting disease-causing mutations in human genes. Evo 2 is publicly accessible via Arc's GitHub repository and is integrated into the NVIDIA BioNeMo framework.
Learn more
Biohub
Biohub is an open platform for building on the world model of protein biology. It provides access to the ESM family of models, including ESMC, ESMFold2, and ESM3, along with interactive tools and developer resources for protein science research. ESMC is a state-of-the-art protein language model trained on billions of evolutionary sequences, building representations that capture fundamental mechanisms of protein structure and function. It powers functional analysis, structure prediction, protein design, and the exploration of evolutionary relationships between proteins. ESMFold2 predicts high-resolution, all-atom 3D structures of biomolecular complexes directly from sequence, with optional multiple sequence alignment input for enhanced accuracy on challenging targets. ESM3 jointly models sequence, structure, and function, enabling controllable generation of novel proteins by conditioning on any combination of these modalities.
Learn more
Profluent
Profluent's platform revolutionizes protein design by integrating advanced AI with in-house wet-lab capabilities, enabling the creation of proteins either inspired by nature or reimagined from scratch. This holistic approach allows for precise, adaptable, and scalable solutions to complex biological challenges, delivering results that redefine what's possible with proteins. Profluent's foundation models push the frontier of protein design beyond the limitations of random discovery, facilitating the optimization of multiple attributes simultaneously, accessing greater sequence diversity, and enabling novel functionalities. By extrapolating into new protein spaces, Profluent offers unique possibilities beyond natural or patented proteins, making it cheaper, easier, and feasible for partners to achieve commercial success. Profluent's capabilities are built on a commitment to scientific rigor, leveraging diverse datasets and advanced AI to tackle challenges.
Learn more