Tutorial¶
Installation¶
Like many other Python packages and bioinformatics softwares, MAnorm can be obtained easily from PyPI or Bioconda. The command below shows how to install the latest release of MAnorm in a convenient way, but you can also install it from source code alternatively.
Prerequisites¶
Tip
MAnorm is implemented under Python 2.7 and will support Python 3.X in the following updates.
- Python 2.7
- setuptools
- numpy
- matplotlib
- statsmodels
- scipy
Install with pip¶
The latest release of MAnorm is available at PyPI, you can install via pip
:
$ pip install manorm
Install with conda¶
You can also install MAnorm with conda through Bioconda channel:
$ conda install -c bioconda manorm
Install from source code¶
It’s highly recommended to install MAnorm with pip
or conda
. If you prefer to install it from source code,
please read the following steps:
The source code of MAnorm is hosted on GitHub, and setuptools is required for installation.
First, clone the repository of MAnorm:
$ git clone https://github.com/shao-lab/MAnorm.git
Then, install MAnorm in the source directory:
$ cd MAnorm
$ python setup.py install
Note
- You may need to install all dependencies listed in
requirements.txt
. - You may need to modify
$PATH
and$PYTHONPATH
manually to make it work.
Galaxy Installation¶
MAnorm is available on Galaxy, you can incorporate MAnorm into your own Galaxy instance.
Please search and install MAnorm via the Galaxy Tool Shed.
Usage of MAnorm¶
To check whether MAnorm is properly installed, you can inspect the version of MAnorm by -v/--version
option:
$ manorm -v
$ manorm --version
Command-Line Usage¶
MAnorm provide a console script manorm
for running the program, the basic usage should as follows:
$ manorm –p1 peaks_file1.xls –p2 peaks_file2.xls –r1 reads_file1.bed –r2 reads_file2.bed -o output_name
Tip
Please use -h/--help
for the details of all options.
Options¶
-h, --help | Show help message and exit. |
-v, --version | Show version number and exit. |
--p1 | [Required] Peaks file of sample1. |
--p2 | [Required] Peaks file of sample2. |
--r1 | [Required] Reads file of sample1. |
--r2 | [Required] Reads file of sample2. |
--s1 | Reads shiftsize of sample1. Default: 100 |
--s2 | Reads shiftsize of sample2. Default: 100 |
-w | Width of window to calculate read density. Default: 1000 |
-d | Summit-to-summit distance cutoff for common peaks. Default: -w /2 |
-n | Number of simulations to test the enrichment of peaks overlap between two samples. |
-m | M-value cutoff to distinguish biased (sample-specific) peaks from unbiased peaks. |
-p | P-value cutoff to define biased peaks. |
-s | Output additional files which contains the results of original peaks. |
--name1 | Name of sample1. (experiment condition, cell-type etc.) |
--name2 | Name of sample2. |
-o | [Required] Output directory. |
Further explanation:
--s1/--s2
: These values are used to shift reads towards 3’ direction to determine the precise binding site. Set as half of the fragment length.-w
: Half of the window size when counting reads of the peak regions. MAnorm uses windows with unified length of 2 *-w
centered at peak summits/midpoints to calculate the read density. This value should match the typical length of peaks, a value of 1000 is recommended for sharp histone marks like H3K4me3 and H3K9/27ac, and 500 for transcription factors or DNase-Seq.-d
: Summit-to-summit distance cutoff for common peaks. Default=-w
/ 2. Only overlapped peaks with summit-to-summit distance less than than this value are considered as real common peaks of two samples when fitting M-A normalization model.-m
: M-value (log2 fold change) cutoff to distinguish biased peaks from unbiased peaks. Peaks with M-value >=-m
and P-value <=-p
are defined as sample1-biased(specific) peaks, while peaks with M-value <= -1 *-m
and P-value <=-p
are defined as sample2-biased peaks.-s
: By default, MAnorm will write the comparison results of unique and merged common peaks in a single output file. With this option on, MAnorm will output two extra files which contains the results of the original(unmerged) peaks.--name1/--name2
: If specified, it will be used to replace the peaks/reads input file name as the sample name in output files.-o
: Output directory. When--name1
and--name2
are not specified, MAnorm will use it as the prefix of comparison output file.
Input Format¶
Format of Peaks file¶
Standard BED format and MACS xls format are supported, other supported format are listed below:
* 3-columns tab split format
# chr start end
chr1 2345 4345
chr1 3456 5456
chr2 6543 8543
* 4-columns tab split format
# chr start end summit
chr1 2345 4345 254
chr1 3456 5456 127
chr2 6543 8543 302
Note
The fourth column summit is the relative position to start.
Format of Reads file¶
Only BED format are supported for now. More format will be embedded in the following updates.
MAnorm Output¶
- output_name_all_MAvalues.xls
This is the main output result of MAnorm which contains the M-A values and normalized read density of each peak, common peaks from two samples are merged together.
- chr: chromosome name
- start: start position of the peak
- end: end position of the peak
- summit: summit position of the peak (relative to start)
- m_value: M value (log2 Fold change) of normalized read densities under comparison
- a_value: A value (average signal strength) of normalized read densities under comparison
- p_value
- peak_group: indicates where the peak is come from
- normalized_read_density_in _sample1
- normalized_read_density_in_sample2
Note
Coordinates in .xls file is under 1-based coordinate-system.
- output_filters/
- sample1_biased_peaks.bed
- sample2_biased_peaks.bed
- output_name_unbiased_peaks.bed
- output_tracks/
- output_name_M_values.wig
- output_name_A_values.wig
- output_name_P_values.wig
- output_figures/
- output_name_MA_plot_before_normalization.png
- output_name_MA_plot_after_normalization.png
- output_name_MA_plot_with_P-value.png
- output_name_read_density_on_common_peaks.png