Symbiont

The ecosystem for thriving AI-driven biology.

Idea in Biotechnology

Introduction

Symbiont is a marketplace for synthetic biologists, researchers, and startups to monetize their data assets and train models with searchable, verifiable, off-the-shelf biological datasets.


Problem

AI has the power to transform biology more than any other field--biological problems exist in a noisy, high-dimension solution space whose design rules are still poorly understood but are emergent only with vast amounts of data. However, access to high-quality, affordable training data is the bottleneck holding most developers back. While LLM's have access to free, abundant, and high-quality language online, generating biological data requires extensive training and expertise, expensive instrumentation, and time to generate even a few individual datapoints. Meanwhile, many biological datasets already exist, but collect dust behind institutional firewalls with no clear market on which to unlock their value. Sharing data involves cumbersome background checks and collaboration agreements, and it is often difficult to verify the quality of these training data without expensively reproducing entire experiments.


Opportunity

Symbiont is a platform that allows participants to list their data assets and sell "read access" or "train access" (for privacy-preserving federated learning) to other researchers on a searchable exchange. Symbiont also offers simple, templatized, one-click user license agreements for both parties.

Lastly, Symbiont scores all data assets for authenticity by incorporating number of academic citations, our own wetware reproduction of select datapoints (where Symbiont selectively reproduces randomly selected portions of a screen), as well as leveraging our community with decentralized consensus-based validation-- ie, allowing participants to be "validators" who anonymously reproduce or test portions of datasets to fine-tune it's authenticity score. This real-world, network-driven approach can help overcome the barriers between cutting edge research and high quality data.