Transcription Factor Binding Prediction from ATAC-seq and scATAC-seq with Deep Neural Networks
The variants function is intended for using a maxATAC model to predict TF binding in non-overlapping LD blocks. The function can perform whole genome prediction, but has not been optimized for that yet. The variants function takes as input a bed file of variants and the nucleotide to use at that position. You must also provide the regions that you want to make predictions in. This function will then merge nearby ROI intervals (+/- 512 bp) and create sliding windows (1,024 bp wide x 256 bp step) along the ROI. Regions at the end of the interval will be trimmmed off if they are less than 1,024 bp.
maxatac variants -m ELF1_99.h5 -signal GM12878__slop20bp_RP20M_minmax01.bw -name GM12878_ELF1 -s hg38.2bit --chromosome chr20 -variants_bed AD_risk_loci.bed
--genomeSpecify which genome build this task is specified for (i.e. hg38).
-m, --modelThe trained maxATAC model that will be used to predict TF binding. This is a h5 file produced from maxatac train.
-i, -s, --signalThe ATAC-seq signal bigwig track that will be used to make predictions of TF binding.
--variants_bedThe bed file of nucleotides to change. The first 3 columns should be the coordinates and the fourth column should be the nucleotide to use.
-n, --name, --prefixOutput filename prefix to use.
-s, --sequenceThis argument specifies the path to the 2bit DNA sequence for the genome of interest. Default: hg38.2bit
-roiThe bed file of intervals to use for prediction windows. Predictions will be limited to these specific regions. Only the first 3 columns of the file will be considered when making the prediction windows. Default: Whole genome prediction.
-o, --output_dirOutput directory path. Default: ./variantss
--loglevelThis argument is used to set the logging level. Currently, the only working logging level is ERROR.
--blacklistThe path to a bigwig file that has regions to exclude. Default: maxATAC defined blacklist.
-step_sizeThe number of base pairs to overlap the 1,024 bp regions during prediction. This should be in multiples of 256. Default: 256
-c, -chroms, --chromosomesThe chromosomes to make predictions on. Default: All chromosomes. chr1-22, X, Y
-cs, --chrom_sizes, --chromosome_sizesThe path to the chromosome sizes file. This is used to generate the bigwig signal tracks. Default: hg38.chrom.sizes