Rendong Yang, Chen Zhang, Zhen Su, LSPR: an integrated periodicity detection algorithm for unevenly sampled temporal microarray data Bioinformatics. 2011; doi:10.1093/bioinformatics/btr041. [link]

Introduction

LSPR is a Matlab package used to detect periodic expression profiles in DNA microarray time-series data.

Pre-installation

Before running this package, make sure you have installed the following software and related toolboxes:
      - program environments:
        Matlab version R2009 or newer
      - related toolboxes in Matlab:
        1. signal processing toolbox
        2. statistics toolbox
        3. bioinformatics toolbox

Usage

Command-line running:
      - usage:
      1. create a text file named start.command and input following commands:
          matlab –r "cd LSPRpackagePath; LSPR('inputFilename.txt',
          'outputFilename.txt','inputPath','outputPath',defaultPeriod,lower,upper)"

          - explanation:
           inputFilename    -> input text file name
           outputFilename -> output text file name
           inputPath           -> load input file from
           outputPath        -> save output file to
           defaultPeriod     -> use a default period (i.e. 24 for circadian microarray data) to do                                           harmonic analysis when no periods could be detected in                                           [lower,upper]
           lower/upper       -> endpoints of period range

          - example:
           matlab -r "cd /home/user/LSPR; LSPR('inputExample.txt','outputExample.txt',            '/home/user/LSPR/input/', '/home/user/LSPR/output/',24,20,28)"

      2. run command file:
           $ at now -f start.command

Matlab environment:
      - usage:
       >>LSPR('inputFilename.txt','outputFilename.txt','inputPath','outputPath',defaultPeriod,
           lower,upper)
      - example:
       >>LSPR('inputExample.txt','outputExample.txt','input/', 'output/',24,20,28)

Input/Output File

<Input>
      file type: tab delimited text file
      file format:
       1st row        - sampled time points
       1st column   - probesets names
       others          - a NxM matrix representing N genes (probes) with M expression level                              measurements/samples over time.

<Output>
      file type: text file
      file format:
       1st column   - probe names
       2nd column  - filter type
       3rd column   - method
       4th column   - number of oscillations
       5th column   - period
       6th column   - amplitude
       7th column   - phase
       8th column   - R square
       9th column   - pvalue
       10th column - qvalue
       11th column - FDR-BH
      
      - explanation:
       filter type  -     preprocess microarray data with Savitzky-Golay filter or not
                              ' 1' -> microarray data have been detrended and filtered
                              '-1' -> microarray data have been detrended
       method      -    method for harmonic analysis
                              'LSPR'    -> do harmonic analysis with periods detected in [lower,upper]
                              'default' -> do harmonic analysis with a default period
       number of oscillations  -   number of different oscillations detected by LSPR
       period       -     detected periods in [lower,upper] or a given default period
       amplitude -     amplitude of harmonic models
       phase       -     phase of harmonic models
       R square   -     R square of regression curve
       pvalue      -     p-value in harmonic analysis
       qvalue      -     false discovery rate computed by q-value method
       FDR-BH     -     false discovery rate computed by Benjamini-Hochberg method

Supplementary Material

Supplementary material are available at here.

Download

Source codes are available at here.

Web Server

A webserver of LSPR algorithm can be found at: http://bioinfo.cau.edu.cn/BioClock

FAQ

1. How to deal with missing values?
      LSPR will ignore those time-series whose values are missing more than 50% of
      sampling time points. The output parameters corresponding to them will be assigned
      values of "NaN". Samples missing less than 50% of sampling time points will be
      analyzed based on existing experiment values and corresponding time points.

      - example:
      contents of input file:

probe
0
4
8
12
16
20
24
28
32
36
40
44
example01
1
0.8
0.4
0.8
0.8
0.4
0.8
example02
0.5
0.6
0.8
0.5
0.6
0.8
example03
example04
1
0.6
0.6
0.5
0.7
example05
0.8
1
0.8
0.6
0.4
0.6
0.8
1
0.8
0.6
0.4
0.6

      contents of output file:

probe
filter type
method
num.
period
amplitude
phase
R square
pvalue
qvalue
FDR-BH
example01
-1
LSPR
1
24.55
0.26
23.37
0.92
0.006 
NaN
0.009
example02
-1
LSPR
1
27.75
0.16
22.13
0.95
0.009 
NaN
0.009
example03
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
example04
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
example05
-1
LSPR
1
23.1
0.23
4.67
0.81
0.0004
NaN
0.001


2. What data sets can LSPR analyze?
      LSPR can detect oscillations of circadian, cell-cycle microarray data and other temporal
      expression profiles.

3. How are periodic genes determined?
      For a single input gene expression profile, periodicity can be determined by p-value.
      Usually, a gene with p-value < 0.05 is considered to be periodic.

      For large-scale microarray data, periodic genes could be determined by the false
      discovery rate (q-value or FDR-BH value), instead. Generally, the Benjamini-Hochberg
      method (FDR-BH) is more stringent than the q-value method to evaluate the false       discovery rate .

4. How are genes whose output parameters are "NaN" values dealt with?
      Genes with missing values for more than 50% of sampling time points, or that fit linear
      (i.e. y = x+c) or constant expression values (i.e. y = c), will be assigned values of
      "NaN" in the output parameters.

      To get a better result, we suggest to remove genes of this kind and re-analyze the
      rest by the LSPR program.

5. What is the minimum/maximum number of time points? How many genes can the
    application handle at a time?
      Ideally, at least six time points for the input time-series according to our analysis
      and there is no upper limit for the length.

      LSPR analyzes one time-series at a time. If the user's computer has enough
      computing capacity, there will be no limitations for how many gene LSPR can handle
      at a time.

Contact

Please contact us if you have suggestions for improvement, or if you have any problem with the program or with the interpretation of the results.

      Chen ZHANG Rendong Yang
      College of Science College of Biological Sciences
      China Agricultural University China Agricultural University
      P.O.Box 0590 P.O.Box B1061
      100083, Beijing, China 100193, Beijing, China
      tel: +86-13811497473 tel: +86-10-62734385
      email: zcreation@yahoo.cn email: cauyrd@gmail.com