LOGOS: Language of Generative Objects in Science

LOGOS

Overview

LOGOS (Language Of Generative Objects in Science) is the first multi-domain generative framework built on a unified scientific grammar. It encodes diverse scientific objects — proteins, antibodies, small molecules, chemical reactions, materials, and their spatial interactions — as token sequences over a shared vocabulary, enabling a single autoregressive model to perform generation, prediction, and design across the natural sciences.

Unlike approaches that rely on natural language as an intermediary or require explicit 3D geometric networks, LOGOS operates directly on domain-native representations. Key spatial relationships (e.g., protein pocket–ligand contacts) are discretized and tokenized into the shared grammar, allowing the model to learn complex structural interactions in a purely sequential manner.

LOGOS Framework Overview

Key Features

Unified Scientific Grammar: A shared representational interface that encodes heterogeneous scientific objects and cross-object relationships into a common discrete token space.
One Model Fits All: A single autoregressive model handles tasks across proteins, small molecules, materials, reactions, antibodies, and their interactions.
No Explicit 3D Geometry Required: Spatial contact and constraint patterns are captured through tokenized representations, without relying on geometric neural networks or explicit coordinates.
Pre-training & Downstream Alignment: The grammar space ensures formal consistency between continued pre-training objectives and downstream task goals.

Data Construction in LOGOS

Supported Tasks

LOGOS achieves competitive or state-of-the-art performance across six representative downstream tasks:

Task	Domain	Description
Interaction-Aware Ligand Design for Binding Pockets	Drug Discovery	Generate ligands capable of specifically binding to a protein binding pocket
Protein Ligand-Binding Site Identification	Structural Biology	Identify binding pockets from protein sequences
Retrosynthesis Prediction	Chemistry	Predict reactants given a target product
Unconditional Material Generation	Materials Science	Generate novel and valid materials
Protein Editing	Protein Engineering	Edit protein sequences for improved functional properties
Antibody CDR Design	Immunology	Design complementarity-determining regions for antibody engineering

Benchmark Comparison

Model Architecture

LOGOS is based on an autoregressive Transformer architecture with continued multi-domain pre-training on a unified scientific grammar. The framework spans a parameter range from 1B to 8B, with stable scaling behavior observed across this range.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("LOGOS-Hub/LOGOS-8B")
tokenizer = AutoTokenizer.from_pretrained("LOGOS-Hub/LOGOS-8B")

input_text = "<your_scientific_grammar_input>"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

If you find this work useful in your research or applications, please cite our technical report.

@misc{li2026speakinglanguagesciencegeneralpurpose,
      title={Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences}, 
      author={Mingyang Li and Yurou Liu and Jieping Ye and Bing Su and Ji-Rong Wen and Zheng Wang},
      year={2026},
      eprint={2606.16905},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.16905}, 
}

License

This project is released under CC BY 4.0.

We welcome collaboration, feedback, and community contributions to advance unified generative modeling for the natural sciences.

Downloads last month: 7

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for LOGOS-Hub/LOGOS-pretrain-3B

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

Paper • 2606.16905 • Published 2 days ago