Knowledge graph search for biology, chemistry, and toxicology
Ask in plain English, or write SPARQL against 5 public knowledge graphs.
Try asking:
Endpoints
Open SPARQL editor →Crowd-sourced knowledge graph — broad biological, chemical, and biomedical entity coverage with cross-references to most major databases.
Default query: CAS numbers for famous drugs
Look up CAS Registry numbers (P231) for caffeine, aspirin, and glucose.
# CAS numbers for caffeine (Q60235), aspirin (Q60168), glucose (Q47512)
SELECT ?compound ?compoundLabel ?cas WHERE {
VALUES ?compound { wd:Q60235 wd:Q60168 wd:Q47512 }
?compound wdt:P231 ?cas . # P231 = CAS Registry Number
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
NCBI PubChem compounds, substances, bioassays, and references served via QLever — fast lookup for chemical structures and identifiers.
Default query: Caffeine — SMILES and core attributes
Pull the canonical SMILES, molecular formula, and other vocab predicates for CID 2519.
# Caffeine (CID 2519) — SMILES, formula, identifiers
PREFIX compound: <http://rdf.ncbi.nlm.nih.gov/pubchem/compound/>
PREFIX vocab: <http://rdf.ncbi.nlm.nih.gov/pubchem/vocabulary#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?p ?o WHERE {
compound:CID2519 ?p ?o .
FILTER(?p IN (
dcterms:identifier,
vocab:connectivity_smiles,
vocab:isomeric_smiles,
vocab:covalent_unit_count,
vocab:defined_atom_stereo_count,
vocab:exact_mass,
vocab:molecular_formula
))
}
Curated protein sequence and function database (Swiss-Prot + TrEMBL) with disease, GO, and cross-database annotations.
Default query: Human insulin (INS_HUMAN) details
Look up a protein by its UniProt mnemonic and pull its recommended name and gene.
# Human insulin
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?protein ?name ?gene WHERE {
?protein a up:Protein ;
up:mnemonic "INS_HUMAN" ;
up:recommendedName/up:fullName ?name ;
up:encodedBy/skos:prefLabel ?gene .
}
LIMIT 5
EBI ChEMBL bioactivity database — drugs, targets, assays, and standard activity measurements. Public mirror is intermittently slow.
Default query: Aspirin (CHEMBL25) basic info
ChEMBL's public mirror can be slow — start with a single-molecule lookup. Increase the SPARQL editor timeout if needed.
# Aspirin: label, type, max phase, etc.
PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?p ?o WHERE {
<http://rdf.ebi.ac.uk/resource/chembl/molecule/CHEMBL25> ?p ?o .
}
LIMIT 25
BioBricks-curated knowledge graph — 165M triples across 53 source datasets (ctdbase, tox21, ecotox, consensus-bioactivity, coconut, bindingdb, chembl-smiles, and more). Served on-prem via Virtuoso on ws4.
Default query: Largest source graphs in the KG
Count triples per named graph — shows the breakdown across the 53 biobricks sources (ctdbase, tox21, ecotox, etc.).
# Top sources by triple count
SELECT ?graph (COUNT(*) AS ?triples)
WHERE { GRAPH ?graph { ?s ?p ?o } }
GROUP BY ?graph
ORDER BY DESC(?triples)
LIMIT 20
Featured queries
Look up CAS Registry numbers (P231) for caffeine, aspirin, and glucose.
Find compounds with a 'medical condition treated' (P2175) link to T2DM (Q3025883).
Look up Q-IDs by exact English label — useful before querying CAS, InChI, etc.
Drugs and the proteins they bind (P129 = physically interacts with), filtered to a few well-known drugs.
Compounds with an IARC carcinogenicity classification (P5572) and the assigned group.
Pull the canonical SMILES, molecular formula, and other vocab predicates for CID 2519.
List the first compounds linked to the FDA-approved-drugs concept (RO_0000087 = has-role).
Find compound CIDs whose 2D connectivity SMILES matches caffeine's (stereoisomers, salts).
Find compounds with exact mass between 180 and 181 Da — quick mass-spec triage.
Look up a protein by its UniProt mnemonic and pull its recommended name and gene.
Reviewed human proteins (taxonomy 9606) annotated with a UniProt disease, plus disease label.
All reviewed UniProt entries whose primary gene symbol is BRCA1, across organisms.
Reviewed human protein kinases (EC 2.7.11.-) that have at least one PDB cross-reference.
ChEMBL's public mirror can be slow — start with a single-molecule lookup. Increase the SPARQL editor timeout if needed.
Bioactivity measurements where target = CHEMBL_TARGET 240 (5-HT1A receptor). Narrow VALUES to keep response fast.
Count triples per named graph — shows the breakdown across the 53 biobricks sources (ctdbase, tox21, ecotox, etc.).
Use the CTDbase named graph to find chemical→disease associations. CTDbase is curated by NCBI for environmental toxicology.
Sample compound → assay outcome triples from the Tox21 in-vitro screening dataset (20M triples).
Cross-graph join — find chemicals that have both regulatory annotations (CTDbase) and in-vitro screening data (Tox21).
Discovery — list distinct predicates in any one named graph. Swap the graph URI to explore other sources.