Log-transformed gene expression distribution

Please note:


-A little asymmetry of distribution is expected when the sample size is small (i.e., <30) even it comes from a standard normal distribution

-When sample size is large, further normality examination should be based on normality test in the next tab

Normality test for log-transformed data


Attention should be paid if there is a large number of outliers in either tail, which violates the normality assumption.

The dots should follow the straight line if they come from normal distribution. Then the t-test is recommended. If not, please use the Wilcoxon signed-rank test.



Key summary statistics

The observed sample statistics were:

Hypothesis of the t-test

We are testing the null hypothesis that the means of each population equal

P-value from t-test:


                        

A low P value (e.g., <0.05) suggests that your sample provides enough evidence that you can reject the null hypothesis.

About this test

This non-parametric test is less powerfull than parametric tests (including the t-test). It is used when data do not follow the normal distribution. However, it could be an alternative when the log-transformed data distribution fails to meet normality requirement. This test is also called Mann–Whitney U test, and Mann–Whitney–Wilcoxon (MWW).

Hypothesis of the Wilcoxon rank-sum test

The null hypothesis is that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample. Note, we use two-sided test here as default.

P-value from Wilcoxon rank-sum test


                        

A low P value (e.g., <0.05) suggests that your sample provides enough evidence that you can reject the null hypothesis.

Log-transformed gene expression distribution


If either group follow normal distribution, it should follow a bell shape


Please note:


-A little asymmetry of distribution is expected when sample size is small (<30) even it comes from a standard normal distribution

-When sample size is large, further normality examination should rely on normality test in the next tab

Normality test for log-transformed data


Attention should be paid if there is a large number of outliers in either tail, which violates the normality assumption.

The dots should follow the straight line if they come from normal distribution. Then the t-test is recommended. If not, please use the Wilcoxon signed-rank test.


File 1


File 2

Key summary statistics

The observed sample statistics were:

Hypothesis of the t-test

We are testing the null hypothesis that the means of each population equal

P-value from t-test:


                        

A low P value (e.g., <0.05) suggests that your sample provides enough evidence that you can reject the null hypothesis.

About this test

This non-parametric test is a less powerfull than parametric tests (including the t-test). It is used when data do not follow the normal distribution or sample size is too small to tell. However, it could be an alternative when the log-transformed data distribution fails to meet normality requirement.

Hypothesis of the Wilcoxon signed-rank test

compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ

P-value from Wilcoxon signed-rank test:


                        

A low P value (e.g., <0.05) suggests that your sample provides enough evidence that you can reject the null hypothesis.

About this test

This non-parametric test is less powerfull than parametric tests (including the t-test). It is used when data do not follow the normal distribution. However, it could be an alternative when the log-transformed data distribution fails to meet normality requirement. This test is also called Mann–Whitney U test, and Mann–Whitney–Wilcoxon (MWW).

Hypothesis of the Wilcoxon rank-sum test

The null hypothesis is that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample. Note, we use two-sided test here as default.

P-value from Wilcoxon rank-sum test


                        

A low P value (e.g., <0.05) suggests that your sample provides enough evidence that you can reject the null hypothesis.


Purpose


C-REx enables gene group expression comparison within or across samples.


A quick instance where our method could be helpful:

Suppose you have a group of genes of unknown function , such as a novel family of transcription factors (TF).

You want to test a hypothesis that this novel TF gene group could be activated in a specific stress condition, such as heat stress.

C-REx provides a data-processing pipeline to enable statistical testing (via Student t-test/Wilcoxon signed-rank test) to determine whether RNA expression patterns are significantly different between stress and non-stress samples.

The difference between the typical differential expression (DE), discovery-based approach and our group-to-group RNA expression comparison is illustrated below. We use a distribution of gene expression values for gene group rather than DE gene sets where gene expression is analyzed individually.



Features


1. We provide a webtool to transform your FPKM/TPM RNA dataset into normal distributions, which enables student t-test for downstream analysis

2. Visualization of log-transformed gene group expression distribution and draw QQ-plot to test normality

3. Use housekeeping gene to normalize across sample gene group expression level

4. T-test on gene groups within a sample or across samples

5. An alternative non-parametric Wilcoxon signed-rank test is also provided if log-transformed distribution does not follow normal distribution

Upload input file format


-Should be in csv, separated by comma, NOT zipped!

-Has 3 columns, namely Gene ID, Gene Expression value (TPM/FPKM), Gene Group


Two examples as below:


data from Makarevitch et al. 2015

maize B73 RNA-seq under heat stress conditions

AC147602.5_FG004,188.42990159969,non TF genes

AC148152.3_FG005,8.73844335765158,non TF genes

AC148152.3_FG008,93.7227957598519,non TF genes

AC148167.6_FG001,96.1663916765039,non TF genes

AC149475.2_FG002,17.846234593301,non TF genes

AC149475.2_FG003,37.1563876166064,non TF genes

AC149475.2_FG005,39.795811239933,non TF genes

AC149475.2_FG007,1.69852605933062,non TF genes

AC149810.2_FG008,3.9723493339935,non TF genes

AC149818.2_FG001,7.13727197998732,non TF genes
Download example 1 data

maize B73 RNA-seq under non-stress conditions

AC147602.5_FG004,201.411825599906,non TF genes

AC148152.3_FG005,4.97887906298883,non TF genes

AC148152.3_FG008,78.8313272531063,non TF genes

AC148167.6_FG001,69.2050966298259,non TF genes

AC149475.2_FG002,6.16576535299732,non TF genes

AC149475.2_FG003,12.7370309176619,non TF genes

AC149475.2_FG005,20.9201998332308,non TF genes

AC149475.2_FG007,1.51412422166159,non TF genes

AC149818.2_FG001,2.77512580265959,non TF genes

AC149818.2_FG006,10.113616006654,non TF genes
Download example 2 data

Notes:


-Please do not include commas inside gene group names, a bad example would be 'Human,embryo genes'

-If there are more than one annotation for the same gene, you need to DUPLICATE each gene entry, such as:

AC149818.2_FG001,188.42990159969,non TF genes

AC149818.2_FG001,188.42990159969,housekeeping genes

-If you are comparing the same group of genes under two conditions, genes with TPM or FPKM values smaller than 1 in both condition should be removed. Usually such small measurement is not reliable.

-Also, please use the gene group label 'housekeeping genes' to annotate housekeeping genes, some bad examples would be Housekeeping Genes, housekeeping, HOUSEKEEPING...

Additional test data:

Those files also used in publication

Download Gramene-UV-stress.csv
Download maize-GAMER-UV-stress.csv
Download Gramene-non-stress.csv
Download maize-GAMER-non-stress.csv
Download Gramene-non-stress-biological-replicate-1.csv
Download Gramene-non-stress-biological-replicate-2.csv
Download maize-GAMER-non-stress-biological-replicate-1.csv
Download maize-GAMER-non-stress-biological-replicate-2.csv

Contact information

Developer and maintainer: Mingze He



PI: Carolyn Lawrence-Dill




Lawrence-Dill lab