Dictionary Data

Dictionary files

The following dictionary files are available to download. By default they expect to be saved under directory /data/ETL/dictionaries. If you can use this path you can save the need to edit scripts and configuration files below.

DictionaryFileLoad scriptReleaseDescription
Entrez entrez-dictionary-human-mouse.tar.xz load-entrez.sh Nov 2016 Entrez gene ids and names for human and mouse
Gene Ontology geneontology-dictionary.tar.xz load-go.sh Nov 2016 Gene ontology and GOA files for human
HMDB hmdb-dictionary.tar.xz load-hmdb.sh 3.6 Nov 2016 Human metabolome database extracted data, and extraction script
KEGG kegg-dictionary.tar.xz load-kegg.sh Nov 2016 The last public release of KEGG retrieved from an old server. More recent pathway databases should be preferred, and suggestions are welcome
MeSH mesh-dictionary.tar.xz load-mesh.sh 2016 Medline subject headings
MiRBase mirbase-dictionary.tar.xz load-mirbase.sh Nov 2016 MiRBase microRNA database
Taxonomy taxonomy-dictionary.tar.xz (none) Nov 2016 Taxonomy (species name and synonyms) for human and mouse
UniProt uniprot-human-dictionary.tar.xz load-uniprot.sh Nov 2016 Human proteins from SwissProt and TrEMBL
Observations reference-observations-dictionary.tar.xz load-obs.sh Nov 2016 Entrez gene ids and names for human and mouse
SNPs for GWAS reference-vcf-dictionary.tar.xz load-snp.sh Nov 2016 VCF file of SNPs for GWAS MAGIC test data

Loading scripts

The full set of loading scripts can be downloaded from load-dictionaries.tar.xz.

Unpack (preferably into /data/ETL) with

    tar -xJf load-dictionaries.tar.xz
  

if the dictionary files were not unpacked into /data/ETL/dictionaries then edit all the load*.sh scripts and conf/*.properties files to replace this with the path to your dictionary files.

The scripts expect to find tranSMART-ETL installed in /data/ETL/tranSMART-ETL by downloading and running "mvn package"

You can use a symlink in the expected path, or simply edit the load*.sh scripts.