Dandelion Class - Dandelion Documentation (2023)

Dandelion Class - Dandelion Documentation (1)

Many of the features and utilities of thedandelionpackage turnsdandelionclass object. The class acts as an intermediary object for storage and flexible interaction with other tools. This section will guide you through a brief introduction todandelionClass.

import modules

[1]:
import osos.chdir(os.path.expanduser('/Users/kt16/Downloads/dandelion_tutorial/'))
dandelion==0.2.4.dev101 pandas==1.4.2 numpy==1.21.6 matplotlib==3.5.2 networkx==2.8.4 scipy==1.8.1
[2]:
vdj = ddl.read_h5ddl('dandelion_results.h5ddl')vdj
[2]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten

Basically, the object can be summarized in the following figure:Dandelion Class - Dandelion Documentation (2)

essentially the.DataThe slot contains the AIRR control table while the.metadatahas a compressed version that supports combination withAnnadataof.obsSlot. You can retrieve these spaces like a typical class object; For example if I want the metadata:

[3]:
vdj.metadata
[3]:
ID do clone clone_id_by_size identification example VDJ_Location locus_VJ productive_VDJ produktiv_VJ v_call_genotyped_VDJ d_call_VDJ j_call_VDJ ... in_count_VDJ mu_count_VJ mu_count Junction_length_VDJ Junction_length_VJ union_aa_length_VDJ union_aa_length_VJ np1_long_VDJ np1_long_VJ np2_long_VDJ
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG None None sc5p_v2_hs_PBMC_10k None ZIG None T None None None ... Yaya 27,0 27,000000 Yaya 33,0 Yaya 11,0 Yaya 2.0 Yaya
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC B_21_3_2_90_2_2 2191 sc5p_v2_hs_PBMC_10k IG H ZIG T T IGHV1-69 IGHD3-22 IGHJ3 ... 0,0 0,0 0,000000 63,0 33,0 21,0 11,0 4.0 0,0 5,0
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG B_54_1_2_59_1_1 1172 sc5p_v2_hs_PBMC_10k IG H IGL T T IGHV1-2 None IGHJ3 ... 22,0 8,0 15,000000 42,0 33,0 14,0 11,0 18,0 0,0 Yaya
sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC B_168_4_4_18_1_1 1086 sc5p_v2_hs_PBMC_10k IG H ZIG T T IGHV5-51 None IGHJ3 ... 0,0 0,0 0,000000 54,0 33,0 18,0 11,0 24,0 0,0 Yaya
sc5p_v2_hs_PBMC_10k_AAACGGGAGCGACGTA B_73_2_1_14_2_7 1398 sc5p_v2_hs_PBMC_10k IG H IGL T T IGHV4-4 IGHD6-13 IGHJ3 ... 0,0 0,5 0,333333 54,0 36,5 18,0 12,0 10,0 0,0 0,0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
vdj_v1_hs_pbmc3_TTTCCCTCAGCAATATG B_94_2_1_71_2_8 384 vdj_v1_hs_pbmc3 IG H ZIG T T IGHV2-5 IGHD5/OR15-5b, IGHD5/OR15-5a IGHJ4, IGHJ5 ... 13,0 10,0 11,000000 30,0 34,0 10,0 11.5 0,0 0,0 14,0
vdj_v1_hs_pbmc3_TTTCCTCAGCGCTTAT B_70_6_1_76_1_3 400 vdj_v1_hs_pbmc3 IG H ZIG T T IGHV3-30 IGHD4-17 IGHJ6 ... 0,0 0,0 0,000000 60,0 33,0 20,0 11,0 4.0 0,0 10,0
vdj_v1_hs_pbmc3_TTTCCTCAGGGAAACA B_27_1_1_129_4_13 381 vdj_v1_hs_pbmc3 IG H ZIG T T IGHV4-59 IGHD6-13 IGHJ2 ... 16,0 6.0 11,000000 48,0 33,0 16,0 11,0 3.0 0,0 1,0
vdj_v1_hs_pbmc3_TTTGGCGCCATACCATG B_77_7_2_157_2_5 380 vdj_v1_hs_pbmc3 IG H IGL T T IGHV1-69 IGHD2-15 IGHJ6 ... 0,0 0,0 0,000000 66,0 39,0 22,0 13,0 5,0 0,0 3.0
vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG B_169_5_7_3_3_4 379 vdj_v1_hs_pbmc3 IG H IGL T T IGHV3-23 None IGHJ4 ... 8,0 4.0 6,000000 39,0 36,0 13,0 12,0 21,0 6.0 Yaya

2773 rows × 48 columns

disco

you can cut itdandelionobject through the.DataÖ.metadatathrough its indices, with a behavior similar to that of pandasdata framejAnnadata.

disco .Data

[4]:
vdj[vdj.data['clone_id'] == 'B_21_3_2_90_2_2']
[4]:
Objekt der Klasse Löwenzahn mit n_obs = 1 und n_contigs = 2 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_ b la stn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'seq uence_alignment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_by_size', 'Jsample', 'locus ' produktiv_VDJ', 'produktiv_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'junction_VDJ_V',',junction 'junction_Va'Ja , _VDJa 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ ', 'isotipo', 'isotype_status ', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_VDJ', 'mu_count_mu_count', ' 'junction_length_VDJ', 'junction_length'junction_VDJ' , , 'junction_aa_length_VJ', 'np1_ length_VDJ', 'np1_length_VJ', 'np2_leng th_VDJ' Darst ellung: Darstellung für 1 Punkte, Darstellung für 0 Punkte Grafik: Grafik-Netzwerkx von 1 Punkten, Grafik-Netzwerkx von 0 Punkten
[5]:
vdj[vdj.data_names.isin(['sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1','sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2','sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1','sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1','sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2',])]
[5]:
Classe Dandelion com n_obs = 3 e n_contigs = 5 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'união ', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , ' d_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3' , 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , ' j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_b la s tn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', 'j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_start_blastn', 'd_sequence_start_blastn' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_sjunction'core_sequence, 'c_comprimento_a_call ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'seq uence_alignment_aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_by_size', 'Jsample', 'locus' produktiv_VDJ' , 'produktiv_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ_V',',junction 'junction_aa'Ja', _Ja 'VDJa, _VDJa v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ ', 'isotipo', 'isotype_status ' , 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_VDJ', 'mu_count_mu_count', ' 'junction_length_VDJ', 'junction_length'junction_length_VJ', ' junção_aa_length_VJ', 'np1_ length_VDJ', 'np1_length_VJ', 'np2_leng th_VDJ' design: dis eño para 3 vértices, design para 0 vértices Grafik: grafisches Netzwerkx von 3 vértices, grafisches Netzwerkx von 0 vértices

disco .metadata

[6]:
vdj[vdj.metadata['productivo_VDJ'].isin(['T','T|T'])]
[6]:
Objekt der Klasse Dandelion com n_obs = 2585 e n_contigs = 6137 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2585 Scheitelpunkte, Darstellung für 1067 Scheitelpunkte Grafik: Redx-Grafik von 2585 Scheitelpunkten, Redx-Grafik von 1067 Scheitelpunkten
[7]:
vdj[vdj.metadata_names == 'vdj_v1_hs_pbmc3_TTTCCTCAGCGCTTAT']
[7]:
Objekt der Klasse Löwenzahn mit n_obs = 1 und n_contigs = 2 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_ b la stn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'seq uence_alignment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_by_size', 'Jsample', 'locus ' produktiv_VDJ', 'produktiv_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'junction_VDJ_V',',junction 'junction_Va'Ja , _VDJa 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ ', 'isotipo', 'isotype_status ', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_VDJ', 'mu_count_mu_count', ' 'junction_length_VDJ', 'junction_length'junction_VDJ' , , 'junction_aa_length_VJ', 'np1_ length_VDJ', 'np1_length_VJ', 'np2_leng th_VDJ' diseñ o: design para 1 vértices, design para 1 vértices Grafik: grafisches Netzwerkx von 1 Punkten, grafisches Netzwerkx von 1 Punkten

copy of

You can copy the deepdandelionObject to another variable that will inherit all slots:

[8]:
vdj2 = vdj.copy()vdj2.metadaten
[8]:
ID do clone clone_id_by_size identification example VDJ_Location locus_VJ productive_VDJ produktiv_VJ v_call_genotyped_VDJ d_call_VDJ j_call_VDJ ... in_count_VDJ mu_count_VJ mu_count Junction_length_VDJ Junction_length_VJ union_aa_length_VDJ union_aa_length_VJ np1_long_VDJ np1_long_VJ np2_long_VDJ
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG None None sc5p_v2_hs_PBMC_10k None ZIG None T None None None ... Yaya 27,0 27,000000 Yaya 33,0 Yaya 11,0 Yaya 2.0 Yaya
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC B_21_3_2_90_2_2 2191 sc5p_v2_hs_PBMC_10k IG H ZIG T T IGHV1-69 IGHD3-22 IGHJ3 ... 0,0 0,0 0,000000 63,0 33,0 21,0 11,0 4.0 0,0 5,0
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG B_54_1_2_59_1_1 1172 sc5p_v2_hs_PBMC_10k IG H IGL T T IGHV1-2 None IGHJ3 ... 22,0 8,0 15,000000 42,0 33,0 14,0 11,0 18,0 0,0 Yaya
sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC B_168_4_4_18_1_1 1086 sc5p_v2_hs_PBMC_10k IG H ZIG T T IGHV5-51 None IGHJ3 ... 0,0 0,0 0,000000 54,0 33,0 18,0 11,0 24,0 0,0 Yaya
sc5p_v2_hs_PBMC_10k_AAACGGGAGCGACGTA B_73_2_1_14_2_7 1398 sc5p_v2_hs_PBMC_10k IG H IGL T T IGHV4-4 IGHD6-13 IGHJ3 ... 0,0 0,5 0,333333 54,0 36,5 18,0 12,0 10,0 0,0 0,0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
vdj_v1_hs_pbmc3_TTTCCCTCAGCAATATG B_94_2_1_71_2_8 384 vdj_v1_hs_pbmc3 IG H ZIG T T IGHV2-5 IGHD5/OR15-5b, IGHD5/OR15-5a IGHJ4, IGHJ5 ... 13,0 10,0 11,000000 30,0 34,0 10,0 11.5 0,0 0,0 14,0
vdj_v1_hs_pbmc3_TTTCCTCAGCGCTTAT B_70_6_1_76_1_3 400 vdj_v1_hs_pbmc3 IG H ZIG T T IGHV3-30 IGHD4-17 IGHJ6 ... 0,0 0,0 0,000000 60,0 33,0 20,0 11,0 4.0 0,0 10,0
vdj_v1_hs_pbmc3_TTTCCTCAGGGAAACA B_27_1_1_129_4_13 381 vdj_v1_hs_pbmc3 IG H ZIG T T IGHV4-59 IGHD6-13 IGHJ2 ... 16,0 6.0 11,000000 48,0 33,0 16,0 11,0 3.0 0,0 1,0
vdj_v1_hs_pbmc3_TTTGGCGCCATACCATG B_77_7_2_157_2_5 380 vdj_v1_hs_pbmc3 IG H IGL T T IGHV1-69 IGHD2-15 IGHJ6 ... 0,0 0,0 0,000000 66,0 39,0 22,0 13,0 5,0 0,0 3.0
vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG B_169_5_7_3_3_4 379 vdj_v1_hs_pbmc3 IG H IGL T T IGHV3-23 None IGHJ4 ... 8,0 4.0 6,000000 39,0 36,0 13,0 12,0 21,0 6.0 Yaya

2773 rows × 48 columns

retrieve entries withupdate_metadata

Los.metadataThe slot in the Dandelion class is automatically initialized every time the.Datathe slot is full. However, only a default, pre-specified number of columns is returned. To get other columns from the.DataSlot, we can update the metadata withddl.update_metadataand specify the optionsto rememberjrecovery mode.

The following modes determine how the restore is completed:

Pull apart j single only- splits recovery into VDJ and VJ chains. ONE|will separatesingleElement.

Pull apart j shortcut- splits recovery into VDJ and VJ chains. ONE|will separateallElement.

shortcut j single only- similar to the previous one, but merged into a single column.

Pull apart- Leisure divided intoIndividuallyColumns for each contig.

shortcut- Merge restoration into onesinglecolumn where a|will separateallElement.

There are additional options for numeric columns:

Pull apart j additive- divides recovery into VDJ and VJ chains and sums them separately.

Pull apart j Average- similar to above, but average instead of sum.

additive- Sums recoveries into a single column.

Average- calculates the average of retrievals in a single column.

erecovery modenot specified, the default will bePull apart j shortcut

Example: Retrieve fwr1 sequences

[9]:
ddl.update_metadata(vdj, recuperar = 'fwr1')vdj
[9]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten

Note the additionalhusband1VDJ and VJ columns in the metadata slot.

Pattern,dandeliondoes not attempt to merge numeric columns as it may create mixed type columns.

There is a new subfunction that tries to get commonly used columns like:np1_long,np2_longitude:

[10]:
vdj.update_plus()vdj
/Users/kt16/miniconda3/envs/dandelion/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: medium empty segment./Users/kt16/miniconda3/envs/dandelion/lib / python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: Invalid value found in double_scalars
[10]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten

concatenate multiple objects

This is a simple function to concatenate (add) two or more together.dandelionclass, thepandasDataframe Note that this works on.Dataslot and not the.metadataSlot.

[11]:
# For example, the original dandelion class has 2773 unique cell barcodes and 9005 contigsvdj
[11]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten
[12]:
# Now you have 27015 (9005*3) contigs and the metadata should also be filled in correctly vdj_concat = ddl.concat([vdj, vdj, vdj])vdj_concat
[12]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 27015 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', ' j _c all_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa', 'sequence_ali gnmen t_aa“, „v_sequence_alignment_aa“, „d_sequence_alignment_aa“, „j_sequence_alignment_aa“, „mu_count“, „ambiguous“, „rearrangement_status“, „clone_id“, „changeo_clone_id“ ', 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ' d_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction junction_VJ', 'junction_aa 'junction_aa_VDJ , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', ' j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VarrangementJ'rearrangement_status_VarrangementJ'
[13]:
vdj_concat.data[['secuencia_id', 'cell_id']].head()
[13]:
sequence_id cell_id
sequence_id
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-0 sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-0 sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-1 sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-1 sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-2 sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1-2 sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC_contig_2-0 sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC_contig_2-0 sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2-1 sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2-1 sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTTGTC

ddl.concatAlso, you can add your custom prefixes/suffixes to add to sequence IDs. If not specified, it will be added-0,-1etc. as a suffix when it finds that sequence IDs are not unique, as shown above.

Read write

dandelionThe lesson can be saved with.write_h5ddlj.write_pklworks with the associated compression methods.write_h5ddlmainly uses pandasa_hdflibrary andwrite_pkljust get cucumber.empty_h5ddljaprender_pklThe functions read respective file formats accordingly.

[14]:
%tiempo vdj.write_h5ddl('dandelion_results.h5ddl', complib = 'bzip2')
CPU Times: User 11.4s, System: 437ms, Total: 11.9s, Wall Time: 18s

If you see any warnings above, it's due to mixed types of d somewhere in the object. So check it out if you think it will affect future use.

[fifteen]:
%tiempo vdj_1 = ddl.read_h5ddl('Dandelion_Results.h5ddl')vdj_1
CPU Times: User 1.99s, System: 258ms, Total: 2.25s, Wall Time: 2.84s
[fifteen]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten

The read/write times withPicklescan be faster/slower and file sizes can also be smaller/larger (depending on the compression used).

[sixteen]:
%time vdj.write_pkl('dandelion_results.pkl.gz')
CPU-Zeiten: user 17.3 s, sys: 175 ms, total: 17.5 sWall time: 24.3 s
[17]:
%tiempo vdj_2 = ddl.read_pkl('Dandelion_Results.pkl.gz')vdj_2
CPU Times: User 379ms, System: 44.4ms, Total: 424ms Wall Time: 490ms
[17]:
Objekt der Klasse Löwenzahn mit n_obs = 2773 e n_contigs = 9005 Daten: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'unión', 'unión_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end ' , 'v_keimbahn_start', 'v_keimbahn_ende', 'd_sequenz_start', 'd_sequenz_ende', 'd_keimbahn_start', 'd_keimbahn_ende', 'j_sequenz_ende', 'j_sequenz_ende', 'j_keimbahn_start', 'j_keimbahn_ende', 'v_score', 'v_support' v_identity , 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2' , ' cdr3', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'v_call_genotyped', 'germline_alignment_d_mask', 'id_mue' , 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j _ ca ll_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_number_of_mismatches_blastn', 'j_number_of_gap_openings_blastn', 'j_sequence_start_blastn', 'j_sequence_end_blastn', 'j_germline_start_blastn', 'j_germline_end_blastn', 'j_support_blastn', 'j_score_blastn', 'j_sequence_alignment_blastn', ' j_germline_alignment_blastn' , 'cell_id_blastn', 'j_source', 'd_support_igblastn', 'd_score_igblastn', 'd_call_igblastn', 'd_call_blastn', 'd_identity_blastn', 'd_alignment_length_blastn', 'd_number_of_mismatches_blastn', 'd_number_of_gap_openings_blastn', 'd_sequence_start_blastn', 'd_sequence_end_blastn' , ' d_germline_start_blastn', 'd_germline_end_blastn', 'd_support_blastn', 'd_score_blastn', 'd_sequence_alignment_blastn', 'd_germline_alignment_blastn', 'd_source', 'c_sequence_alignment', 'c_germline_alignment', 'c_sequence_start', 'c_s' core_sequence, '1c_call' 'junction_aa_length ', 'fwr1_aa', 'fwr2_aa', 'fwr3_aa', 'fwr4_aa', 'cdr1_aa', 'cdr2_aa', 'cdr3_aa' , 'sequence_ali gnment _aa', 'v_sequence_alignment_aa', 'd_sequence_alignment_aa', 'j_sequence_alignment_aa', 'mu_count', 'ambiguous', 'rearrangement_status', 'clone_id', 'changeo_clone_id' Metadados: 'clone_id', 'clone_id_sample_sizeD', _J'Vlocus _V , 'productive_VDJ', 'productive_VJ', 'v_call_genotyped_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_genotyped_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VDJ', 'J,junction_VDjunction', 'junction_aa _aa _a _aa _aa _a _aa _aa _vjction , 'v_call_genotyped_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_genotyped_B_VJ', 'j_call_B_VJ', 'c_call_B_VDJ', 'c_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'isotype ', ' isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ', 'changeo_clone_id', 'fwr1_VJ', 'fwr1_VDJ', 'mu_count_J'_mu_count', 'mu_count_J'_mu_count' ', 'junction_length_J' , 'junction_aa_length_VDJ', 'junction_aa_length_VJ', Desenho 'np1_length_VDJ', 'n p1_ length_VJ', 'np2_length_VDJ': Darstellung für 2773 Eckpunkte, Darstellung für Grafik von 1067 Eckpunkten: Grafik von Redx von 2773 Eckpunkten, Grafik von Redx von 1067 Eckpunkten
[ ]:

References

Top Articles
Latest Posts
Article information

Author: Duncan Muller

Last Updated: 08/12/2023

Views: 6753

Rating: 4.9 / 5 (59 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Duncan Muller

Birthday: 1997-01-13

Address: Apt. 505 914 Phillip Crossroad, O'Konborough, NV 62411

Phone: +8555305800947

Job: Construction Agent

Hobby: Shopping, Table tennis, Snowboarding, Rafting, Motor sports, Homebrewing, Taxidermy

Introduction: My name is Duncan Muller, I am a enchanting, good, gentle, modern, tasty, nice, elegant person who loves writing and wants to share my knowledge and understanding with you.