maxATAC

Transcription Factor Binding Prediction from ATAC-seq and scATAC-seq with Deep Neural Networks

View the Project on GitHub MiraldiLab/maxATAC

Predict

The predict function will use a maxATAC model to predict TF binding in a new condition. The user must provide a model and a bigwig file that corresponds to an ATAC-seq signal track.

Example

maxatac predict --models CTCF.h5 --signal GM12878.bigwig

or

maxatac predict --tf CTCF --signal GM12878.bigwig

Required Arguments

-tf, --tf_name or -m, --model

The user must provide either the TF name that they want to make predictions for or the h5 model file they desire. If the user provides a TF name, the best model will be used and the correct threshold file will be provided for peak calling.

-s, --signal

The ATAC-seq signal bigwig track that will be used to make predictions of TF binding.

Optional Arguments

--sequence

This argument specifies the path to the 2bit DNA sequence for the genome of interest. maxATAC models are trained with hg38 so you will need the correct .2bit file.

"-cutoff_type", "--cutoff_type"

The cutoff type (i.e. Precision, Recall, F1, log2FC). (F1 = F1-score, and log2FC = Log2( Precision : Random Precision)). Default: F1.

"-cutoff_value", "--cutoff_value"

The cutoff value for the cutoff type provided. Note precision, recall, and F1-scores range 0-1, while better-than-random log2FC scores range from 0 to infinity. Example: .7

-cutoff_file, --cutoff_file

The cutoff file provided in /data/models that corresponds to the average validation performance metrics for the TF model.

--output

Output directory path. Default: ./prediction_results

--blacklist

The path to a bigwig file that has regions to exclude. Default: maxATAC-defined blacklist.

--roi

The path to a bed file that contains the genomic regions to predict TF binding in. These regions should be at least 1024 bp, the maxATAC model input regions.

--batch_size

The number of regions to predict on per batch. Default 10000. Decrease this value if you are having memory issues.

--step_size

The step size to use for building the prediction intervals. Overlapping prediction bins will be averaged together. Default: INPUT_LENGTH/4, where INPUT_LENGTH is the maxATAC model input size of 1,024 bp.

--prefix

Output filename prefix to use. Default maxatac_predict.

--chromosome_sizes

The path to the chromosome sizes file. This is used to generate the bigwig signal tracks.

--chromosomes

The chromosomes to make predictions on. Our models do not currently considered chromosomes X or Y. This means that most of the files will not contain this information. You should not predict in chrX or chrY unless you know your bigwig contains these chromosomes. Default: Autosomal chromosomes 1-22.

--loglevel

This argument is used to set the logging level. Currently, the only working logging level is ERROR.

-bin, --bin_size

The bin size to use for calling peaks. Default: 200 bp based on the same sized used for benchmarking predictions.