Introduction
LSPR is a Matlab package used to detect periodic expression profiles in DNA microarray time-series data.Pre-installation
Before running this package, make sure you have installed the following software and related toolboxes:- program environments:
Matlab version R2009 or newer
- related toolboxes in Matlab:
1. signal processing toolbox
2. statistics toolbox
3. bioinformatics toolbox
Usage
Command-line running:
- usage:
1. create a text file named start.command and input following commands:
matlab –r "cd LSPRpackagePath; LSPR('inputFilename.txt',
'outputFilename.txt','inputPath','outputPath',defaultPeriod,lower,upper)"
- explanation:
inputFilename -> input text file name
outputFilename -> output text file name
inputPath -> load input file from
outputPath -> save output file to
defaultPeriod -> use a default period (i.e. 24 for circadian microarray data) to do harmonic analysis when no periods could be detected in [lower,upper]
lower/upper -> endpoints of period range
- example:
matlab -r "cd /home/user/LSPR; LSPR('inputExample.txt','outputExample.txt', '/home/user/LSPR/input/', '/home/user/LSPR/output/',24,20,28)"
2. run command file:
$ at now -f start.command
Matlab environment:
- usage:
>>LSPR('inputFilename.txt','outputFilename.txt','inputPath','outputPath',defaultPeriod,
lower,upper)
- example:
>>LSPR('inputExample.txt','outputExample.txt','input/', 'output/',24,20,28)
Input/Output File
<Input>
file type: tab delimited text file
file format:
1st row - sampled time points
1st column - probesets names
others - a NxM matrix representing N genes (probes) with M expression level measurements/samples over time.
<Output>
file type: text file
file format:
1st column - probe names
2nd column - filter type
3rd column - method
4th column - number of oscillations
5th column - period
6th column - amplitude
7th column - phase
8th column - R square
9th column - pvalue
10th column - qvalue
11th column - FDR-BH
- explanation:
filter type - preprocess microarray data with Savitzky-Golay filter or not
' 1' -> microarray data have been detrended and filtered
'-1' -> microarray data have been detrended
method - method for harmonic analysis
'LSPR' -> do harmonic analysis with periods detected in [lower,upper]
'default' -> do harmonic analysis with a default period
number of oscillations - number of different oscillations detected by LSPR
period - detected periods in [lower,upper] or a given default period
amplitude - amplitude of harmonic models
phase - phase of harmonic models
R square - R square of regression curve
pvalue - p-value in harmonic analysis
qvalue - false discovery rate computed by q-value method
FDR-BH - false discovery rate computed by Benjamini-Hochberg method
Supplementary Material
Supplementary material are available at here.Download
Source codes are available at here.Web Server
A webserver of LSPR algorithm can be found at: http://bioinfo.cau.edu.cn/BioClockFAQ
1. How to deal with missing values?
LSPR will ignore those time-series whose values are missing more than 50% of
sampling time points. The output parameters corresponding to them will be assigned
values of "NaN".
Samples missing less than 50% of sampling time points will be
analyzed based on existing experiment values and corresponding time points.
- example:
contents of input file:
probe |
0 |
4 |
8 |
12 |
16 |
20 |
24 |
28 |
32 |
36 |
40 |
44 |
example01 |
1 |
0.8 |
0.4 |
0.8 |
0.8 |
0.4 |
0.8 |
|||||
example02 |
0.5 |
0.6 |
0.8 |
0.5 |
0.6 |
0.8 |
||||||
example03 |
||||||||||||
example04 |
1 |
0.6 |
0.6 |
0.5 |
0.7 |
|||||||
example05 |
0.8 |
1 |
0.8 |
0.6 |
0.4 |
0.6 |
0.8 |
1 |
0.8 |
0.6 |
0.4 |
0.6 |
contents of output file:
probe | filter type |
method |
num. |
period |
amplitude |
phase |
R square |
pvalue |
qvalue |
FDR-BH |
example01 | -1 |
LSPR |
1 |
24.55 |
0.26 |
23.37 |
0.92 |
0.006 |
NaN |
0.009 |
example02 | -1 |
LSPR |
1 |
27.75 |
0.16 |
22.13 |
0.95 |
0.009 |
NaN |
0.009 |
example03 | NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
example04 | NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
example05 | -1 |
LSPR |
1 |
23.1 |
0.23 |
4.67 |
0.81 |
0.0004 |
NaN |
0.001 |
2. What data sets can LSPR analyze?
LSPR can detect oscillations of circadian, cell-cycle microarray data and other temporal
expression profiles.
3. How are periodic genes determined?
For a single input gene expression profile, periodicity can be determined by p-value.
Usually, a gene with p-value < 0.05 is considered to be periodic.
For large-scale microarray data, periodic genes could be determined by the false
discovery rate (q-value or FDR-BH value), instead. Generally, the Benjamini-Hochberg
method (FDR-BH) is more stringent than the q-value method to evaluate the false discovery rate .
4. How are genes whose output parameters are "NaN" values dealt with?
Genes with missing values for more than 50% of sampling time points, or that fit linear
(i.e. y = x+c) or constant expression values (i.e. y = c), will be assigned values of
"NaN" in the output parameters.
To get a better result, we suggest to remove genes of this kind and re-analyze the
rest by the LSPR program.
5. What is the minimum/maximum number of time points? How many genes can the
application handle at a time?
Ideally, at least six time points for the input time-series according to
our analysis
and there is no upper limit for the length.
LSPR analyzes one time-series at a time. If the user's computer has enough
computing capacity, there will be no limitations for how many gene LSPR can handle
at
a time.
Contact
Please contact us if you have suggestions for improvement, or if you have any problem with the program or with the interpretation of the results.Chen ZHANG | Rendong Yang |
College of Science | College of Biological Sciences |
China Agricultural University | China Agricultural University |
P.O.Box 0590 | P.O.Box B1061 |
100083, Beijing, China | 100193, Beijing, China |
tel: +86-13811497473 | tel: +86-10-62734385 |
email: zcreation@yahoo.cn | email: cauyrd@gmail.com |