Transcription Factor Binding Prediction from ATAC-seq and scATAC-seq with Deep Neural Networks
The variants
function is intended for using a maxATAC model to predict TF binding in non-overlapping LD blocks. The function can perform whole genome prediction, but has not been optimized for that yet. The variants
function takes as input a bed file of variants and the nucleotide to use at that position. You must also provide the regions that you want to make predictions in. This function will then merge nearby ROI intervals (+/- 512 bp) and create sliding windows (1,024 bp wide x 256 bp step) along the ROI. Regions at the end of the interval will be trimmmed off if they are less than 1,024 bp.
maxatac variants -m ELF1_99.h5 -signal GM12878__slop20bp_RP20M_minmax01.bw -name GM12878_ELF1 -s hg38.2bit --chromosome chr20 -variants_bed AD_risk_loci.bed
--genome
Specify which genome build this task is specified for (i.e. hg38).
-m, --model
The trained maxATAC model that will be used to predict TF binding. This is a h5 file produced from maxatac train
.
-i
, -s
, --signal
The ATAC-seq signal bigwig track that will be used to make predictions of TF binding.
--variants_bed
The bed file of nucleotides to change. The first 3 columns should be the coordinates and the fourth column should be the nucleotide to use.
-n
, --name
, --prefix
Output filename prefix to use.
-s
, --sequence
This argument specifies the path to the 2bit DNA sequence for the genome of interest. Default: hg38.2bit
-roi
The bed file of intervals to use for prediction windows. Predictions will be limited to these specific regions. Only the first 3 columns of the file will be considered when making the prediction windows. Default: Whole genome prediction.
-o
, --output_dir
Output directory path. Default: ./variantss
--loglevel
This argument is used to set the logging level. Currently, the only working logging level is ERROR
.
--blacklist
The path to a bigwig file that has regions to exclude. Default: maxATAC defined blacklist.
-step_size
The number of base pairs to overlap the 1,024 bp regions during prediction. This should be in multiples of 256. Default: 256
-c
, -chroms
, --chromosomes
The chromosomes to make predictions on. Default: All chromosomes. chr1-22, X, Y
-cs
, --chrom_sizes
, --chromosome_sizes
The path to the chromosome sizes file. This is used to generate the bigwig signal tracks. Default: hg38.chrom.sizes