Hugging Face 发布了开源 DNA 基础模型 Carbon,具有开放权重、训练代码和数据管道,比同尺寸最佳模型快 275 倍,可在笔记本电脑上本地运行,使用 DNA 原生 tokenizer 分割序列。
The future of biology shouldn’t stay behind black-box APIs. Especially when it touches personal health.
Whether you’re @bryan_johnson measuring every biomarker, or @sytses openly sharing and analyzing his own immune-genetics data, you need open, local, transparent AI.
@huggingface wasn’t created to be a biology company. It’s not the most obvious focus for us. But it feels too important not to do something.
That’s why we built and released Carbon 🧬: a frontier DNA base model with open weights, training code and data pipeline, designed to be fine-tuned or continually pretrained for downstream biological tasks.
Carbon is 275x faster than the next best model at its size. Fast enough to run locally on your laptop. Powerful enough to process a whole human genome on a single GPU in less than 2 days.
The technical unlock: a DNA-native tokenizer that splits sequences into 6-base chunks for efficiency, while preserving single-base resolution during training and inference. More people able to inspect, run, fine-tune, improve and build on top of the models shaping biology.
Open weights: https://t.co/vgEklL5q4q
Dataset: https://t.co/R960HgOvSP
Demo: https://t.co/tnujkPeaNb
Let's go open AI biology!
likes: 210 | retweets: 43 | replies: 17 | views: 17061