Reference Module¶
The SADIE reference module abstracts the underlying reference data used by the AIRR and Numbering modules. Both of these modules use external database files. Their organization (particularly by AIRR, which ports IGBlast) can be extremely complicated. Making a new reference database is a tedious and time-consuming task. This module provides a simple interface for making your own reference databases.
Builtin reference
SADIE ships with a reference database that contains the most common species along with functional genes. The average user will not need to use this module as the database is comprehensive. You can see each entry by looking either directly at the paths used src/sadie/airr/data/
for AIRR and src/sadie/anarci/data
for the renumbering module. Another convenient way to look at the reference database is to view the reference.yml. More on how that file is structured will be provided.
Germline Gene Gateway¶
New germline gene segments are being discovered at a rapid pace. To meet the needs of this changing landscape, SADIE gets all of the germline gene info from a programmatic API called the Germline Gene Gateway. This API is hosted as a free service. It consists of germline genes from IMGT as well as custom genes that have been annotated and cataloged by programs such as IGDiscover. To explore the API, visit the Germline Gene Gateway. This RESTful API conforms to the OpenAPI 3.0 specification.
Examples of how to use the G3 API¶
The following examples show how to pull genes programmatically using the command line utilities curl
, wget
and the requests
library in Python. It will fetch the first 5 V-Gene segments in IMGT notation.
$ curl -X 'GET' 'https://g3.jordanrwillis.com/api/v1/genes?source=imgt&segment=V&common=human&limit=5' -H 'accept: application/json' -o 'human_v.json'
$ wget 'https://g3.jordanrwillis.com/api/v1/genes?source=imgt&segment=V&common=human&limit=5' -O human_v.json
import json
import requests
from sadie.reference import G3Error
url = "https://g3.jordanrwillis.com/api/v1/genes?source=imgt&segment=V&common=human&limit=5"
response = requests.get(url)
response_json = response.json()
if response.status_code != 200:
raise G3Error("Error: " + str(response.status_code))
print(json.dumps(response_json, indent=4))
json.dump(response_json, open("human_v.json", "w"), indent=4)
The output will be a JSON file containing the V-Gene segment and all relevant information needed by SADIE to write out databases needed by the AIRR and Numbering modules.
human_v.json
[
{
"_id": "608b90908e6710a05b587046",
"source": "imgt",
"common": "human",
"gene": "IGHV1-18*01",
"label": "V-REGION",
"gene_segment": "V",
"receptor": "IG",
"sequence": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA",
"latin": "Homo_sapiens",
"imgt": {
"sequence": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA",
"sequence_gapped": "CAGGTTCAGCTGGTGCAGTCTGGAGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTT............ACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTAC......AATGGTAACACAAACTATGCACAGAAGCTCCAG...GGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA",
"sequence_gapped_aa": "QVQLVQSGA.EVKKPGASVKVSCKASGYTF....TSYGISWVRQAPGQGLEWMGWISAY..NGNTNYAQKLQ.GRVTMTTDTSTSTAYMELRSLRSDDTAVYYCAR",
"fwr1": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCT",
"fwr1_aa": "QVQLVQSGAEVKKPGASVKVSCKAS",
"fwr1_start": 0,
"fwr1_end": 74,
"cdr1": "GGTTACACCTTTACCAGCTATGGT",
"cdr1_aa": "GYTFTSYG",
"cdr1_start": 75,
"cdr1_end": 98,
"fwr2": "ATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGG",
"fwr2_aa": "ISWVRQAPGQGLEWMGW",
"fwr2_start": 99,
"fwr2_end": 149,
"cdr2": "ATCAGCGCTTACAATGGTAACACA",
"cdr2_aa": "ISAYNGNT",
"cdr2_start": 150,
"cdr2_end": 173,
"fwr3": "AACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGT",
"fwr3_aa": "NYAQKLQGRVTMTTDTSTSTAYMELRSLRSDDTAVYYC",
"fwr3_start": 174,
"fwr3_end": 287,
"cdr3": "GCGAGAGA",
"cdr3_aa": "AR",
"cdr3_start": 288,
"cdr3_end": 295,
"imgt_functional": "F",
"contrived_functional": "F"
}
},
{
"_id": "608b90908e6710a05b587048",
"source": "imgt",
"common": "human",
"gene": "IGHV1-18*02",
"label": "V-REGION",
"gene_segment": "V",
"receptor": "IG",
"sequence": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTAAGATCTGACGACACGGCC",
"latin": "Homo_sapiens",
"imgt": {
"sequence": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTAAGATCTGACGACACGGCC",
"sequence_gapped": "CAGGTTCAGCTGGTGCAGTCTGGAGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTT............ACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTAC......AATGGTAACACAAACTATGCACAGAAGCTCCAG...GGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTAAGATCTGACGACACGGCC",
"sequence_gapped_aa": "QVQLVQSGA.EVKKPGASVKVSCKASGYTF....TSYGISWVRQAPGQGLEWMGWISAY..NGNTNYAQKLQ.GRVTMTTDTSTSTAYMELRSLRSDDTA",
"fwr1": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCT",
"fwr1_aa": "QVQLVQSGAEVKKPGASVKVSCKAS",
"fwr1_start": 0,
"fwr1_end": 74,
"cdr1": "GGTTACACCTTTACCAGCTATGGT",
"cdr1_aa": "GYTFTSYG",
"cdr1_start": 75,
"cdr1_end": 98,
"fwr2": "ATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGG",
"fwr2_aa": "ISWVRQAPGQGLEWMGW",
"fwr2_start": 99,
"fwr2_end": 149,
"cdr2": "ATCAGCGCTTACAATGGTAACACA",
"cdr2_aa": "ISAYNGNT",
"cdr2_start": 150,
"cdr2_end": 173,
"fwr3": "AACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTAAGATCTGACGACACGGCC",
"fwr3_aa": "NYAQKLQGRVTMTTDTSTSTAYMELRSLRSDDTA",
"fwr3_start": 174,
"fwr3_end": 275,
"cdr3": "",
"cdr3_aa": "",
"cdr3_start": null,
"cdr3_end": null,
"imgt_functional": "F",
"contrived_functional": "F"
}
},
{
"_id": "608b90908e6710a05b587049",
"source": "imgt",
"common": "human",
"gene": "IGHV1-18*03",
"label": "V-REGION",
"gene_segment": "V",
"receptor": "IG",
"sequence": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACATGGCCGTGTATTACTGTGCGAGAGA",
"latin": "Homo_sapiens",
"imgt": {
"sequence": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACATGGCCGTGTATTACTGTGCGAGAGA",
"sequence_gapped": "CAGGTTCAGCTGGTGCAGTCTGGAGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTT............ACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTAC......AATGGTAACACAAACTATGCACAGAAGCTCCAG...GGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACATGGCCGTGTATTACTGTGCGAGAGA",
"sequence_gapped_aa": "QVQLVQSGA.EVKKPGASVKVSCKASGYTF....TSYGISWVRQAPGQGLEWMGWISAY..NGNTNYAQKLQ.GRVTMTTDTSTSTAYMELRSLRSDDMAVYYCAR",
"fwr1": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCT",
"fwr1_aa": "QVQLVQSGAEVKKPGASVKVSCKAS",
"fwr1_start": 0,
"fwr1_end": 74,
"cdr1": "GGTTACACCTTTACCAGCTATGGT",
"cdr1_aa": "GYTFTSYG",
"cdr1_start": 75,
"cdr1_end": 98,
"fwr2": "ATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGG",
"fwr2_aa": "ISWVRQAPGQGLEWMGW",
"fwr2_start": 99,
"fwr2_end": 149,
"cdr2": "ATCAGCGCTTACAATGGTAACACA",
"cdr2_aa": "ISAYNGNT",
"cdr2_start": 150,
"cdr2_end": 173,
"fwr3": "AACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACATGGCCGTGTATTACTGT",
"fwr3_aa": "NYAQKLQGRVTMTTDTSTSTAYMELRSLRSDDMAVYYC",
"fwr3_start": 174,
"fwr3_end": 287,
"cdr3": "GCGAGAGA",
"cdr3_aa": "AR",
"cdr3_start": 288,
"cdr3_end": 295,
"imgt_functional": "F",
"contrived_functional": "F"
}
},
{
"_id": "608b90908e6710a05b58704b",
"source": "imgt",
"common": "human",
"gene": "IGHV1-18*04",
"label": "V-REGION",
"gene_segment": "V",
"receptor": "IG",
"sequence": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTACGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA",
"latin": "Homo_sapiens",
"imgt": {
"sequence": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTACGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA",
"sequence_gapped": "CAGGTTCAGCTGGTGCAGTCTGGAGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTT............ACCAGCTACGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTAC......AATGGTAACACAAACTATGCACAGAAGCTCCAG...GGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA",
"sequence_gapped_aa": "QVQLVQSGA.EVKKPGASVKVSCKASGYTF....TSYGISWVRQAPGQGLEWMGWISAY..NGNTNYAQKLQ.GRVTMTTDTSTSTAYMELRSLRSDDTAVYYCAR",
"fwr1": "CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCT",
"fwr1_aa": "QVQLVQSGAEVKKPGASVKVSCKAS",
"fwr1_start": 0,
"fwr1_end": 74,
"cdr1": "GGTTACACCTTTACCAGCTACGGT",
"cdr1_aa": "GYTFTSYG",
"cdr1_start": 75,
"cdr1_end": 98,
"fwr2": "ATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGG",
"fwr2_aa": "ISWVRQAPGQGLEWMGW",
"fwr2_start": 99,
"fwr2_end": 149,
"cdr2": "ATCAGCGCTTACAATGGTAACACA",
"cdr2_aa": "ISAYNGNT",
"cdr2_start": 150,
"cdr2_end": 173,
"fwr3": "AACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGT",
"fwr3_aa": "NYAQKLQGRVTMTTDTSTSTAYMELRSLRSDDTAVYYC",
"fwr3_start": 174,
"fwr3_end": 287,
"cdr3": "GCGAGAGA",
"cdr3_aa": "AR",
"cdr3_start": 288,
"cdr3_end": 295,
"imgt_functional": "F",
"contrived_functional": "F"
}
},
{
"_id": "608b90908e6710a05b587053",
"source": "imgt",
"common": "human",
"gene": "IGHV1-2*01",
"label": "V-REGION",
"gene_segment": "V",
"receptor": "IG",
"sequence": "CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAACAGTGGTGGCACAAACTATGCACAGAAGTTTCAGGGCAGGGTCACCAGTACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGTCGTGTATTACTGTGCGAGAGA",
"latin": "Homo_sapiens",
"imgt": {
"sequence": "CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTCACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAACAGTGGTGGCACAAACTATGCACAGAAGTTTCAGGGCAGGGTCACCAGTACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGTCGTGTATTACTGTGCGAGAGA",
"sequence_gapped": "CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGATACACCTTC............ACCGGCTACTATATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGGATCAACCCTAAC......AGTGGTGGCACAAACTATGCACAGAAGTTTCAG...GGCAGGGTCACCAGTACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGTCGTGTATTACTGTGCGAGAGA",
"sequence_gapped_aa": "QVQLVQSGA.EVKKPGASVKVSCKASGYTF....TGYYMHWVRQAPGQGLEWMGRINPN..SGGTNYAQKFQ.GRVTSTRDTSISTAYMELSRLRSDDTVVYYCAR",
"fwr1": "CAGGTGCAGCTGGTGCAGTCTGGGGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCT",
"fwr1_aa": "QVQLVQSGAEVKKPGASVKVSCKAS",
"fwr1_start": 0,
"fwr1_end": 74,
"cdr1": "GGATACACCTTCACCGGCTACTAT",
"cdr1_aa": "GYTFTGYY",
"cdr1_start": 75,
"cdr1_end": 98,
"fwr2": "ATGCACTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGACGG",
"fwr2_aa": "MHWVRQAPGQGLEWMGR",
"fwr2_start": 99,
"fwr2_end": 149,
"cdr2": "ATCAACCCTAACAGTGGTGGCACA",
"cdr2_aa": "INPNSGGT",
"cdr2_start": 150,
"cdr2_end": 173,
"fwr3": "AACTATGCACAGAAGTTTCAGGGCAGGGTCACCAGTACCAGGGACACGTCCATCAGCACAGCCTACATGGAGCTGAGCAGGCTGAGATCTGACGACACGGTCGTGTATTACTGT",
"fwr3_aa": "NYAQKFQGRVTSTRDTSISTAYMELSRLRSDDTVVYYC",
"fwr3_start": 174,
"fwr3_end": 287,
"cdr3": "GCGAGAGA",
"cdr3_aa": "AR",
"cdr3_start": 288,
"cdr3_end": 295,
"imgt_functional": "F",
"contrived_functional": "F"
}
}
]
G3 API
The G3 API can be explored live through the documentation. Go to the G3 API Documentation to do so. It is a clean non-redundant dataset that can be used for any project programatically. To learn more, explore the source code. SADIE abstracts most connections with G3, so you should not have to interact with the API directly.
Generating AIRR Reference Database¶
$ sadie reference make -o my_output_database_path -d reference.yml
$
from sadie.reference import References
reference_path = "reference.yml"
references_object = References.from_yaml(reference_path)
outpath = "my_output_database_path"
germline_path = references_object.make_airr_database(outpath)
The reference YAML¶
The reference YAML file is a simple YAML file that takes the following structure.
name:
database:
species:
-gene1
-gene2
species2:
-gene3
-gene4
Field | Description | Example |
---|---|---|
name |
The name that this reference will be called in SADIE | human , mouse , clk |
database |
The database that the gene comes from | IMGT or custom |
species |
The name of the species that will be used in the annotation table | human , mouse |
gene |
The full gene name | IGHV3-23*01 |
Why do we allow multiple species?
Most of the time the name and species will be the same thing. i.e.
human
imgt:
human:
-IGHV3-23*01
-IGHD3-3*01
-IGHJ6*01
However, sometimes, you may work with chimeric models where a transgene is knocked into a model species. Consider the HuGL mouse models from Deli et al. (2020)
hugl18:
imgt:
human:
- IGHV4-59*01
- IGHD3-3*01
- IGHJ3*02
mouse:
- IGHV1-11*01
- IGHV1-12*01
- IGHV1-13*01
- IGHV1-14*01
...
The HuGL18 model will have the full mouse background and three gene segments knocked-in from a human.
Again, a full list of databases, species and genes can be found by exploring the G3 API, click the Try it out
button.
Generating AIRR database with Reference Class¶
Rather than generate a pre-configured database, SADIE can also generate a reference file on the fly. This is useful for procedural analysis, where you generate custom genes for multiple species.
import tempfile
from sadie.reference import Reference, References
# create empty reference object
ref_class = Reference()
with tempfile.TemporaryDirectory() as tmpdirectory:
# Add genes one at a time
ref_class.add_gene({"species": "human", "gene": "IGHV1-69*01", "source": "imgt"})
ref_class.add_gene({"species": "human", "gene": "IGHD3-3*01", "source": "imgt"})
ref_class.add_gene({"species": "human", "gene": "IGHJ6*01", "source": "imgt"})
# call make_airr database on a path
references = References()
references.add_reference("human", ref_class)
references.make_airr_database(tmpdirectory)
or we can use the YAML file as a template to add more genes
import tempfile
from sadie.reference import Reference, References
from sadie.reference.yaml import YamlRef
# enter no file to use reference.yml
yml_ref = YamlRef()
# create empty reference object
ref_class = Reference()
# references class
references = References()
# Iterate through YamlRef
for name in yml_ref:
# these are dictionary entries
for database in yml_ref[name]:
species = yml_ref[name][database]
for single in species:
single_species = single
# gene is a list of genes
genes = yml_ref[name][database][single_species]
if database == "custom" and single_species == "macaque": # only want cat and custom
only_vs = list(filter(lambda x: x[3] == "V", genes)) # only get v genes, lookup third letter for this
for gene in only_vs[:5]: # only getting first 5
ref_class.add_gene({"gene": gene, "species": single_species, "source": database})
references.add_reference("small_macaque", ref_class)
- Copyright © Jordan R. Willis, Troy Sincomb, and Caleb K. Kibet