
Many of the features and utilities of thedandelion
package turnsdandelion
class object. The class acts as an intermediary object for storage and flexible interaction with other tools. This section will guide you through a brief introduction todandelion
Class.
import modules
[1]:
import osos.chdir(os.path.expanduser('/Users/kt16/Downloads/dandelion_tutorial/'))
dandelion==0.2.4.dev101 pandas==1.4.2 numpy==1.21.6 matplotlib==3.5.2 networkx==2.8.4 scipy==1.8.1
[2]:
vdj = ddl.read_h5ddl('dandelion_results.h5ddl')vdj
[2]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten
Basically, the object can be summarized in the following figure:
essentially the.Data
The slot contains the AIRR control table while the.metadata
has a compressed version that supports combination withAnnadata
of.obs
Slot. You can retrieve these spaces like a typical class object; For example if I want the metadata:
[3]:
vdj.metadata
[3]:
ID do clone | clone_id_by_size | identification example | VDJ_Location | locus_VJ | productive_VDJ | produktiv_VJ | v_call_genotyped_VDJ | d_call_VDJ | j_call_VDJ | ... | in_count_VDJ | mu_count_VJ | mu_count | Junction_length_VDJ | Junction_length_VJ | union_aa_length_VDJ | union_aa_length_VJ | np1_long_VDJ | np1_long_VJ | np2_long_VDJ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG | None | None | sc5p_v2_hs_PBMC_10k | None | ZIG | None | T | None | None | None | ... | Yaya | 27,0 | 27,000000 | Yaya | 33,0 | Yaya | 11,0 | Yaya | 2.0 | Yaya |
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC | B_21_3_2_90_2_2 | 2191 | sc5p_v2_hs_PBMC_10k | IG H | ZIG | T | T | IGHV1-69 | IGHD3-22 | IGHJ3 | ... | 0,0 | 0,0 | 0,000000 | 63,0 | 33,0 | 21,0 | 11,0 | 4.0 | 0,0 | 5,0 |
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG | B_54_1_2_59_1_1 | 1172 | sc5p_v2_hs_PBMC_10k | IG H | IGL | T | T | IGHV1-2 | None | IGHJ3 | ... | 22,0 | 8,0 | 15,000000 | 42,0 | 33,0 | 14,0 | 11,0 | 18,0 | 0,0 | Yaya |
sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC | B_168_4_4_18_1_1 | 1086 | sc5p_v2_hs_PBMC_10k | IG H | ZIG | T | T | IGHV5-51 | None | IGHJ3 | ... | 0,0 | 0,0 | 0,000000 | 54,0 | 33,0 | 18,0 | 11,0 | 24,0 | 0,0 | Yaya |
sc5p_v2_hs_PBMC_10k_AAACGGGAGCGACGTA | B_73_2_1_14_2_7 | 1398 | sc5p_v2_hs_PBMC_10k | IG H | IGL | T | T | IGHV4-4 | IGHD6-13 | IGHJ3 | ... | 0,0 | 0,5 | 0,333333 | 54,0 | 36,5 | 18,0 | 12,0 | 10,0 | 0,0 | 0,0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
vdj_v1_hs_pbmc3_TTTCCCTCAGCAATATG | B_94_2_1_71_2_8 | 384 | vdj_v1_hs_pbmc3 | IG H | ZIG | T | T | IGHV2-5 | IGHD5/OR15-5b, IGHD5/OR15-5a | IGHJ4, IGHJ5 | ... | 13,0 | 10,0 | 11,000000 | 30,0 | 34,0 | 10,0 | 11.5 | 0,0 | 0,0 | 14,0 |
vdj_v1_hs_pbmc3_TTTCCTCAGCGCTTAT | B_70_6_1_76_1_3 | 400 | vdj_v1_hs_pbmc3 | IG H | ZIG | T | T | IGHV3-30 | IGHD4-17 | IGHJ6 | ... | 0,0 | 0,0 | 0,000000 | 60,0 | 33,0 | 20,0 | 11,0 | 4.0 | 0,0 | 10,0 |
vdj_v1_hs_pbmc3_TTTCCTCAGGGAAACA | B_27_1_1_129_4_13 | 381 | vdj_v1_hs_pbmc3 | IG H | ZIG | T | T | IGHV4-59 | IGHD6-13 | IGHJ2 | ... | 16,0 | 6.0 | 11,000000 | 48,0 | 33,0 | 16,0 | 11,0 | 3.0 | 0,0 | 1,0 |
vdj_v1_hs_pbmc3_TTTGGCGCCATACCATG | B_77_7_2_157_2_5 | 380 | vdj_v1_hs_pbmc3 | IG H | IGL | T | T | IGHV1-69 | IGHD2-15 | IGHJ6 | ... | 0,0 | 0,0 | 0,000000 | 66,0 | 39,0 | 22,0 | 13,0 | 5,0 | 0,0 | 3.0 |
vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG | B_169_5_7_3_3_4 | 379 | vdj_v1_hs_pbmc3 | IG H | IGL | T | T | IGHV3-23 | None | IGHJ4 | ... | 8,0 | 4.0 | 6,000000 | 39,0 | 36,0 | 13,0 | 12,0 | 21,0 | 6.0 | Yaya |
2773 rows × 48 columns
disco¶
you can cut itdandelion
object through the.Data
Ö.metadata
through its indices, with a behavior similar to that of pandasdata frame
jAnnadata
.
disco .Data
[4]:
vdj[vdj.data['clone_id'] == 'B_21_3_2_90_2_2']
[4]:
Objekt der Klasse Löwenzahn mit n_obs = 1 und n_contigs = 2 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_ b la stn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'seq uence_alignment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_by_size', 'Jsample', 'locus ' produktiv_VDJ', 'produktiv_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'junction_VDJ_V',',junction 'junction_Va'Ja , _VDJa 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ ', 'isotipo', 'isotype_status ', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_VDJ', 'mu_count_mu_count', ' 'junction_length_VDJ', 'junction_length'junction_VDJ' , , 'junction_aa_length_VJ', 'np1_ length_VDJ', 'np1_length_VJ', 'np2_leng th_VDJ' Darst ellung: Darstellung für 1 Punkte, Darstellung für 0 Punkte Grafik: Grafik-Netzwerkx von 1 Punkten, Grafik-Netzwerkx von 0 Punkten
[5]:
vdj[vdj.data_names.isin(['sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1','sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2','sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1','sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1','sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2',])]
[5]:
Classe Dandelion com n_obs = 3 e n_contigs = 5 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'união ', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , ' d_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3' , 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , ' j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_b la s tn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_start_blastn', 'd_sequence_start_blastn' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sjunction'core_sequence, 'c_comprimento_a_call ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'seq uence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_by_size', 'Jsample', 'locus' produktiv_VDJ' , 'produktiv_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ_V',',junction 'junction_aa'Ja', _Ja 'VDJa, _VDJa v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ ', 'isotipo', 'isotype_status ' , 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_VDJ', 'mu_count_mu_count', ' 'junction_length_VDJ', 'junction_length'junction_length_VJ', ' junção_aa_length_VJ', 'np1_ length_VDJ', 'np1_length_VJ', 'np2_leng th_VDJ' design: dis eño para 3 vértices, design para 0 vértices Grafik: grafisches Netzwerkx von 3 vértices, grafisches Netzwerkx von 0 vértices
disco .metadata
[6]:
vdj[vdj.metadata['productivo_VDJ'].isin(['T','T|T'])]
[6]:
Objekt der Klasse Dandelion com n_obs = 2585 e n_contigs = 6137 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2585 Scheitelpunkte, Darstellung für 1067 Scheitelpunkte Grafik: Redx-Grafik von 2585 Scheitelpunkten, Redx-Grafik von 1067 Scheitelpunkten
[7]:
vdj[vdj.metadata_names == 'vdj_v1_hs_pbmc3_TTTCCTCAGCGCTTAT']
[7]:
Objekt der Klasse Löwenzahn mit n_obs = 1 und n_contigs = 2 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_ b la stn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'seq uence_alignment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_by_size', 'Jsample', 'locus ' produktiv_VDJ', 'produktiv_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'junction_VDJ_V',',junction 'junction_Va'Ja , _VDJa 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ ', 'isotipo', 'isotype_status ', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_VDJ', 'mu_count_mu_count', ' 'junction_length_VDJ', 'junction_length'junction_VDJ' , , 'junction_aa_length_VJ', 'np1_ length_VDJ', 'np1_length_VJ', 'np2_leng th_VDJ' diseñ o: design para 1 vértices, design para 1 vértices Grafik: grafisches Netzwerkx von 1 Punkten, grafisches Netzwerkx von 1 Punkten
copy of¶
You can copy the deepdandelion
Object to another variable that will inherit all slots:
[8]:
vdj2 = vdj.copy()vdj2.metadaten
[8]:
ID do clone | clone_id_by_size | identification example | VDJ_Location | locus_VJ | productive_VDJ | produktiv_VJ | v_call_genotyped_VDJ | d_call_VDJ | j_call_VDJ | ... | in_count_VDJ | mu_count_VJ | mu_count | Junction_length_VDJ | Junction_length_VJ | union_aa_length_VDJ | union_aa_length_VJ | np1_long_VDJ | np1_long_VJ | np2_long_VDJ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG | None | None | sc5p_v2_hs_PBMC_10k | None | ZIG | None | T | None | None | None | ... | Yaya | 27,0 | 27,000000 | Yaya | 33,0 | Yaya | 11,0 | Yaya | 2.0 | Yaya |
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC | B_21_3_2_90_2_2 | 2191 | sc5p_v2_hs_PBMC_10k | IG H | ZIG | T | T | IGHV1-69 | IGHD3-22 | IGHJ3 | ... | 0,0 | 0,0 | 0,000000 | 63,0 | 33,0 | 21,0 | 11,0 | 4.0 | 0,0 | 5,0 |
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG | B_54_1_2_59_1_1 | 1172 | sc5p_v2_hs_PBMC_10k | IG H | IGL | T | T | IGHV1-2 | None | IGHJ3 | ... | 22,0 | 8,0 | 15,000000 | 42,0 | 33,0 | 14,0 | 11,0 | 18,0 | 0,0 | Yaya |
sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC | B_168_4_4_18_1_1 | 1086 | sc5p_v2_hs_PBMC_10k | IG H | ZIG | T | T | IGHV5-51 | None | IGHJ3 | ... | 0,0 | 0,0 | 0,000000 | 54,0 | 33,0 | 18,0 | 11,0 | 24,0 | 0,0 | Yaya |
sc5p_v2_hs_PBMC_10k_AAACGGGAGCGACGTA | B_73_2_1_14_2_7 | 1398 | sc5p_v2_hs_PBMC_10k | IG H | IGL | T | T | IGHV4-4 | IGHD6-13 | IGHJ3 | ... | 0,0 | 0,5 | 0,333333 | 54,0 | 36,5 | 18,0 | 12,0 | 10,0 | 0,0 | 0,0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
vdj_v1_hs_pbmc3_TTTCCCTCAGCAATATG | B_94_2_1_71_2_8 | 384 | vdj_v1_hs_pbmc3 | IG H | ZIG | T | T | IGHV2-5 | IGHD5/OR15-5b, IGHD5/OR15-5a | IGHJ4, IGHJ5 | ... | 13,0 | 10,0 | 11,000000 | 30,0 | 34,0 | 10,0 | 11.5 | 0,0 | 0,0 | 14,0 |
vdj_v1_hs_pbmc3_TTTCCTCAGCGCTTAT | B_70_6_1_76_1_3 | 400 | vdj_v1_hs_pbmc3 | IG H | ZIG | T | T | IGHV3-30 | IGHD4-17 | IGHJ6 | ... | 0,0 | 0,0 | 0,000000 | 60,0 | 33,0 | 20,0 | 11,0 | 4.0 | 0,0 | 10,0 |
vdj_v1_hs_pbmc3_TTTCCTCAGGGAAACA | B_27_1_1_129_4_13 | 381 | vdj_v1_hs_pbmc3 | IG H | ZIG | T | T | IGHV4-59 | IGHD6-13 | IGHJ2 | ... | 16,0 | 6.0 | 11,000000 | 48,0 | 33,0 | 16,0 | 11,0 | 3.0 | 0,0 | 1,0 |
vdj_v1_hs_pbmc3_TTTGGCGCCATACCATG | B_77_7_2_157_2_5 | 380 | vdj_v1_hs_pbmc3 | IG H | IGL | T | T | IGHV1-69 | IGHD2-15 | IGHJ6 | ... | 0,0 | 0,0 | 0,000000 | 66,0 | 39,0 | 22,0 | 13,0 | 5,0 | 0,0 | 3.0 |
vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG | B_169_5_7_3_3_4 | 379 | vdj_v1_hs_pbmc3 | IG H | IGL | T | T | IGHV3-23 | None | IGHJ4 | ... | 8,0 | 4.0 | 6,000000 | 39,0 | 36,0 | 13,0 | 12,0 | 21,0 | 6.0 | Yaya |
2773 rows × 48 columns
retrieve entries withupdate_metadata
¶
Los.metadata
The slot in the Dandelion class is automatically initialized every time the.Data
the slot is full. However, only a default, pre-specified number of columns is returned. To get other columns from the.Data
Slot, we can update the metadata withddl.update_metadata
and specify the optionsto remember
jrecovery mode
.
The following modes determine how the restore is completed:
Pull apart j single only
- splits recovery into VDJ and VJ chains. ONE|
will separatesingleElement.
Pull apart j shortcut
- splits recovery into VDJ and VJ chains. ONE|
will separateallElement.
shortcut j single only
- similar to the previous one, but merged into a single column.
Pull apart
- Leisure divided intoIndividuallyColumns for each contig.
shortcut
- Merge restoration into onesinglecolumn where a|
will separateallElement.
There are additional options for numeric columns:
Pull apart j additive
- divides recovery into VDJ and VJ chains and sums them separately.
Pull apart j Average
- similar to above, but average instead of sum.
additive
- Sums recoveries into a single column.
Average
- calculates the average of retrievals in a single column.
erecovery mode
not specified, the default will bePull apart j shortcut
Example: Retrieve fwr1 sequences
[9]:
ddl.update_metadata(vdj, recuperar = 'fwr1')vdj
[9]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten
Note the additionalhusband1
VDJ and VJ columns in the metadata slot.
Pattern,dandelion
does not attempt to merge numeric columns as it may create mixed type columns.
There is a new subfunction that tries to get commonly used columns like:np1_long
,np2_longitude
:
[10]:
vdj.update_plus()vdj
/Users/kt16/miniconda3/envs/dandelion/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: medium empty segment./Users/kt16/miniconda3/envs/dandelion/lib / python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: Invalid value found in double_scalars
[10]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten
concatenate multiple objects¶
This is a simple function to concatenate (add) two or more together.dandelion
class, thepandas
Dataframe Note that this works on.Data
slot and not the.metadata
Slot.
[11]:
# For example, the original dandelion class has 2773 unique cell barcodes and 9005 contigsvdj
[11]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten
[12]:
# Now you have 27015 (9005*3) contigs and the metadata should also be filled in correctly vdj_concat = ddl.concat([vdj, vdj, vdj])vdj_concat
[12]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 27015 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', ' j _c all_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_ali gnmen t_aa“, „v_sequence_alignment_aa“, „d_sequence_alignment_aa“, „j_sequence_alignment_aa“, „mu_count“, „ambiguous“, „rearrangement_status“, „clone_id“, „changeo_clone_id“ ', 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ' d_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction junction_VJ', 'junction_aa 'junction_aa_VDJ , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', ' j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VarrangementJ'rearrangement_status_VarrangementJ'
[13]:
vdj_concat.data[['secuencia_id', 'cell_id']].head()
[13]:
sequence_id | cell_id | |
---|---|---|
sequence_id | ||
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-0 | sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-0 | sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG |
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-1 | sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-1 | sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG |
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-2 | sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-2 | sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG |
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC_contig_2-0 | sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC_contig_2-0 | sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC |
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2-1 | sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2-1 | sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC |
ddl.concat
Also, you can add your custom prefixes/suffixes to add to sequence IDs. If not specified, it will be added-0
,-1
etc. as a suffix when it finds that sequence IDs are not unique, as shown above.
Read write¶
dandelion
The lesson can be saved with.write_h5ddl
j.write_pkl
works with the associated compression methods.write_h5ddl
mainly uses pandasa_hdf
library andwrite_pkl
just get cucumber.empty_h5ddl
japrender_pkl
The functions read respective file formats accordingly.
[14]:
%tiempo vdj.write_h5ddl('dandelion_results.h5ddl', complib = 'bzip2')
CPU Times: User 11.4s, System: 437ms, Total: 11.9s, Wall Time: 18s
If you see any warnings above, it's due to mixed types of d somewhere in the object. So check it out if you think it will affect future use.
[fifteen]:
%tiempo vdj_1 = ddl.read_h5ddl('Dandelion_Results.h5ddl')vdj_1
CPU Times: User 1.99s, System: 258ms, Total: 2.25s, Wall Time: 2.84s
[fifteen]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten
The read/write times withPickles
can be faster/slower and file sizes can also be smaller/larger (depending on the compression used).
[sixteen]:
%time vdj.write_pkl('dandelion_results.pkl.gz')
CPU-Zeiten: user 17.3 s, sys: 175 ms, total: 17.5 sWall time: 24.3 s
[17]:
%tiempo vdj_2 = ddl.read_pkl('Dandelion_Results.pkl.gz')vdj_2
CPU Times: User 379ms, System: 44.4ms, Total: 424ms Wall Time: 490ms
[17]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten
[ ]: