Downloads
Bulk downloads of the SSPsyGene Knowledge Base data for offline analysis. Gene identifiers have been resolved to gene symbols (HGNC for human, MGI for mouse) where mappings exist (how does this work?).
Full database
The TSV ZIP contains one tab-separated file per dataset, plus per-table metadata YAMLs and a manifest. The SQLite database is the same file the website queries; it includes the central gene table and many-to-many link tables for advanced users.
Sample loading code
# In R
manifest <- read.delim("manifest.tsv", stringsAsFactors = FALSE)
tbl <- read.delim("tables/SCZ_Risk_Arrayed_RNAseq_supp_1.tsv",
stringsAsFactors = FALSE)
head(tbl)# In Python (pandas)
import pandas as pd
manifest = pd.read_csv("manifest.tsv", sep="\t")
tbl = pd.read_csv("tables/SCZ_Risk_Arrayed_RNAseq_supp_1.tsv", sep="\t")Per-dataset downloads
Click Data (TSV) for the full table, Metadata (YAML) for column descriptions, citation, and source links, or — when present — Preprocessing (YAML) for the per-step record of how the raw data was cleaned before loading.
About preprocessing provenance
Each dataset's Preprocessing (YAML) file lists every action the data wrangler's preprocess.py script applied to the raw data — gene-symbol rescues, dropped rows, renamed columns, custom transforms — in the order they executed. Read it to audit how a published table was turned into the table you can search and download here.
Common fields you'll see:
step: clean_gene_column— gene-symbol resolution for one column.counts.passed_through= rows whose original symbol resolved directly;counts.rescued_excel= rows where Excel-mangled values like9-Sepwere repaired toSEPTIN9;counts.rescued_make_unique= Rmake.uniquesuffixes (MATR3.1 → MATR3) stripped;counts.rescued_manual_alias= wrangler-curated successor map hits (NOV → CCN3);counts.rescued_ensembl_map= ENSG/ENSMUSG IDs resolved to symbols;counts.unresolved= rows the cleaner could not resolve (kept as-is). The first ~10 unresolved values appear insample_unresolvedfor inspection. See the gene-parser doc for what each rescue step does.step: dropna/step: filter_rows— rows removed by a predicate.rows_before/rows_after/droppedtell you the exact counts.step: rename/step: drop_columns/step: reorder— schema reshape.step: transform_column— a one-off custom string fixup; thedescriptionfield explains what was done.step: read_csv/step: write_csv— bookends recording the source filename and the final column list.
Each cleaned table also keeps two extra columns for row-level provenance: <gene_col>_raw (the original value before cleaning) and _<gene_col>_resolution (the per-row tag — passed_through, rescued_excel,unresolved, etc.). Cross-reference those with the YAML to investigate any specific row. Full walkthrough: how the gene parser works.
