KG

Knowledge graph search for biology, chemistry, and toxicology

Ask in plain English, or write SPARQL against 5 public knowledge graphs.

Try asking:

Wikidata wikidata ~16B statements

Crowd-sourced knowledge graph — broad biological, chemical, and biomedical entity coverage with cross-references to most major databases.

Default query: CAS numbers for famous drugs

Look up CAS Registry numbers (P231) for caffeine, aspirin, and glucose.

# CAS numbers for caffeine (Q60235), aspirin (Q60168), glucose (Q47512)
SELECT ?compound ?compoundLabel ?cas WHERE {
  VALUES ?compound { wd:Q60235 wd:Q60168 wd:Q47512 }
  ?compound wdt:P231 ?cas .                          # P231 = CAS Registry Number
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
PubChem pubchem ~120B triples

NCBI PubChem compounds, substances, bioassays, and references served via QLever — fast lookup for chemical structures and identifiers.

Default query: Caffeine — SMILES and core attributes

Pull the canonical SMILES, molecular formula, and other vocab predicates for CID 2519.

# Caffeine (CID 2519) — SMILES, formula, identifiers
PREFIX compound: <http://rdf.ncbi.nlm.nih.gov/pubchem/compound/>
PREFIX vocab: <http://rdf.ncbi.nlm.nih.gov/pubchem/vocabulary#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?p ?o WHERE {
  compound:CID2519 ?p ?o .
  FILTER(?p IN (
    dcterms:identifier,
    vocab:connectivity_smiles,
    vocab:isomeric_smiles,
    vocab:covalent_unit_count,
    vocab:defined_atom_stereo_count,
    vocab:exact_mass,
    vocab:molecular_formula
  ))
}
UniProt uniprot ~165B triples

Curated protein sequence and function database (Swiss-Prot + TrEMBL) with disease, GO, and cross-database annotations.

Default query: Human insulin (INS_HUMAN) details

Look up a protein by its UniProt mnemonic and pull its recommended name and gene.

# Human insulin
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?protein ?name ?gene WHERE {
  ?protein a up:Protein ;
           up:mnemonic "INS_HUMAN" ;
           up:recommendedName/up:fullName ?name ;
           up:encodedBy/skos:prefLabel ?gene .
}
LIMIT 5
ChEMBL chembl ~3B triples

EBI ChEMBL bioactivity database — drugs, targets, assays, and standard activity measurements. Public mirror is intermittently slow.

Default query: Aspirin (CHEMBL25) basic info

ChEMBL's public mirror can be slow — start with a single-molecule lookup. Increase the SPARQL editor timeout if needed.

# Aspirin: label, type, max phase, etc.
PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?p ?o WHERE {
  <http://rdf.ebi.ac.uk/resource/chembl/molecule/CHEMBL25> ?p ?o .
}
LIMIT 25
BioBricks KG (local) biobricks-kg ~165M triples, 53 graphs

BioBricks-curated knowledge graph — 165M triples across 53 source datasets (ctdbase, tox21, ecotox, consensus-bioactivity, coconut, bindingdb, chembl-smiles, and more). Served on-prem via Virtuoso on ws4.

Default query: Largest source graphs in the KG

Count triples per named graph — shows the breakdown across the 53 biobricks sources (ctdbase, tox21, ecotox, etc.).

# Top sources by triple count
SELECT ?graph (COUNT(*) AS ?triples)
WHERE { GRAPH ?graph { ?s ?p ?o } }
GROUP BY ?graph
ORDER BY DESC(?triples)
LIMIT 20

Featured queries

wikidata CAS numbers for famous drugs Try →

Look up CAS Registry numbers (P231) for caffeine, aspirin, and glucose.

identifiers lookup chemicals
wikidata Drugs treating type 2 diabetes Try →

Find compounds with a 'medical condition treated' (P2175) link to T2DM (Q3025883).

drugs diseases
wikidata Find a chemical's Wikidata entity by name Try →

Look up Q-IDs by exact English label — useful before querying CAS, InChI, etc.

lookup identifiers
wikidata Drug → biological target Try →

Drugs and the proteins they bind (P129 = physically interacts with), filtered to a few well-known drugs.

drugs proteins drug-discovery
wikidata IARC carcinogen classifications Try →

Compounds with an IARC carcinogenicity classification (P5572) and the assigned group.

toxicology carcinogens regulatory
pubchem Caffeine — SMILES and core attributes Try →

Pull the canonical SMILES, molecular formula, and other vocab predicates for CID 2519.

structure lookup identifiers
pubchem FDA-approved drug compounds Try →

List the first compounds linked to the FDA-approved-drugs concept (RO_0000087 = has-role).

drugs regulatory
pubchem Compounds sharing caffeine's connectivity Try →

Find compound CIDs whose 2D connectivity SMILES matches caffeine's (stereoisomers, salts).

structure search
pubchem Compounds by exact mass range Try →

Find compounds with exact mass between 180 and 181 Da — quick mass-spec triage.

structure mass-spec
uniprot Human insulin (INS_HUMAN) details Try →

Look up a protein by its UniProt mnemonic and pull its recommended name and gene.

proteins lookup
uniprot Human proteins with disease annotations Try →

Reviewed human proteins (taxonomy 9606) annotated with a UniProt disease, plus disease label.

proteins diseases annotations
uniprot Find proteins by gene symbol Try →

All reviewed UniProt entries whose primary gene symbol is BRCA1, across organisms.

proteins genes lookup
uniprot Human kinases with PDB structures Try →

Reviewed human protein kinases (EC 2.7.11.-) that have at least one PDB cross-reference.

proteins kinases structure
chembl Aspirin (CHEMBL25) basic info Try →

ChEMBL's public mirror can be slow — start with a single-molecule lookup. Increase the SPARQL editor timeout if needed.

drugs lookup
chembl Activities for a target protein Try →

Bioactivity measurements where target = CHEMBL_TARGET 240 (5-HT1A receptor). Narrow VALUES to keep response fast.

bioactivity drug-discovery
biobricks-kg Largest source graphs in the KG Try →

Count triples per named graph — shows the breakdown across the 53 biobricks sources (ctdbase, tox21, ecotox, etc.).

meta discovery
biobricks-kg Chemicals associated with a disease (CTD) Try →

Use the CTDbase named graph to find chemical→disease associations. CTDbase is curated by NCBI for environmental toxicology.

chemicals diseases ctd
biobricks-kg Tox21 assay outcomes by activity Try →

Sample compound → assay outcome triples from the Tox21 in-vitro screening dataset (20M triples).

toxicity assays tox21
biobricks-kg Compounds in both CTDbase and Tox21 Try →

Cross-graph join — find chemicals that have both regulatory annotations (CTDbase) and in-vitro screening data (Tox21).

chemicals cross-graph
biobricks-kg Predicates used in a source graph Try →

Discovery — list distinct predicates in any one named graph. Swap the graph URI to explore other sources.

meta discovery