EasyGO: guidance

Introduction

Following materials tell user how to use every thing of EasyGO. New users are recommended to go through this page to have a glance of what EasyGO can do as a functional enrichment analysis tool.

For other issues like the construction of this server, related technologies or bioinformatics in general, please contact me through E-mail: xzhou82@gmail.com.

Contents

Browse Gene Ontology structure
Perform enrichment analysis on user input

User controllable options

Choose data type
Evidence filter
Statistical tests
FDR-adjusted P value cutoff
Term mapping count cutoff
Gene ontology aspect
Display style

Analyze result in "Text mode"
View result in "Graphical mode"

Browse previously saved result using session ID

Browse Gene Ontology structure

This functionality enables you to browse the vocabulary and hierarchical structure of Gene Ontology system. For complete information on Gene Ontology please refer to www.geneontology.org.

First click "Browse GO" on navigation bar, the 3 root terms of Gene Ontology will be displayed. Then click on "GO:0008150", the account number of biological_process, its children terms are displayed with auto-indent:

You can obtain all the microarray elements annotated by each term. Click "List" link at end of each line and following form appears, lets you choose the data type you want. For example, the term GO:0009414 response to water (can be found under "biological_process" -> "response to stimulus" -> "response to stress"), click its link, following page will be returned:

The Page contains a drop-down menu listing all supported data types. Click submit button and items in corresponding data set annotated by this term will be returned as a table. If we selected to see Arabidopsis ATH1 probe sets, 91 probe sets are found to be annotated by this term. One of them looks like:

For each probe set, following information is displayed:

Probe set name.
BLAST top hit information, GO annotation of top hit entry (here is Arabidopsis protein is transferred to this probe set. The top-hit's name, E-value, source and description are displayed.
A list of associated transcripts. For Affymetrix GeneChip probe sets, this information is from its support data.

Along with the probe set, there is an icon given a link to display GO annotation of this entry graphically. For example, clicking the icon of the probe set 259426_at in above picture, the GO annotation of "biological_process" aspect will be displayed as following graph. Each rectangle is a GO term. Terms directly involved in annotation is colored by yellow, their parental terms are green. (Note that GO annotation plot for all three aspects will be produced, but in this case 259426_at only has annotation in "biological_process")

Perform enrichment analysis on list of probesets

In this section we shall first describe the user-controlled options, then as an example, we analyze a list of Arabidopsis ATH1 probe sets.

The page looks like:

User controlled-options:

Choose data type Supported data types in EasyGO.
Evidence filter Filtrate ontology annotation according to evidence codes. This enables user to concentrate on ontology annotation generated from certain sources or protocol, while excluding some. However, caution should be paid when applying such option. For information about evidence codes, please refer to Evidence code page in Gene Ontology Consortium website.
Statistical test method Used to judge significantly enriched terms in a list of microarray elements compared with another. Currently provides binomial, Chi-square and hypergeometric tests.
FDR adjusted p-value cutoff This is a user selected significance cutoff, default is 0.05. Terms with p-value below this cutoff is deemed as statistically significant, and is highlighted in result display. Smaller cutoff value will restrict result to more significant terms. P-value is derived from one of above test method and then is adjusted by FDR for multiple-test issues.
Cutoff on minimum number of matching items for terms This is used to restrict terms with more associated probesets in direct, non-statstical way. Higher value will reduce result size.
Aspect Perform analysis on one of Gene Ontology aspects.
Display style Ways to display result.
1. Text 2nd level of Gene Ontology is displayed. Children terms can be expanded/collapsed by mouse click(see below).
2. Graphical The resulting terms are plotted in a nice tree-styled directed graph by a free software called Graphviz. Terms are represented by boxes, filled with color that corresponds to test significance.
Text field There are two large text fields for user to paste their list of microarray elements for analysis. Left one is named "list 1" and right one "list 2". Please note that it is required to fill list 1, while list 2 is optional. If only list 1 is filled, the program will analyze items in list 1 and make comparison to its corresponding whole microarray background (pre-computed) to find terms that are significantly higher in list 1. Else, if both lists are filled, terms that is significantly enriched in either list will be found out.

Analyze result in "Text" mode

We will perform analysis on an example list of Arabidopsis ATH1 GeneChip probe sets to demonstrate how to use everything of EasyGO.
The example input list is readily available through a link below input box 1 on the Analysis and browse result page. Note you must prepare your own input list in same format, one entry a row.
(About example input list: the probe sets represent Arabidopsis genes that are up-regulated by low temperature in shoot tissue. In cold-treatment time-course experiments, their expression are up-regulated gradually. Among them, NCED3 (probe set ID: 257280_at) which is key enzyme in abscisic acid synthesis, is a well-known example.)
Let all options stay as default, the initial analysis result will look like, and is explained in detail below:

Session A session ID is provided for user to re-browse current result in future. This is useful when EasyGO have done a big long-time job for you. Currently the saved sessions are cleared manually, about once a month. If you require that your session to stay longer, please send me a mail to let me know.
Input status The status includes total number of input probe sets, and number of probe sets with GO annotation. If list 1 and 2 are filled, both will be reported.
Results displayed in text Firstly the root term for current aspect is printed. Its children terms on level 2 are listed below. If you're familiar with Gene Ontology, you can see that not all children of this root term are listed. This is because some terms has too few associated probe sets in list 1, and is filtered out by 4th option discussed above. The GO account for each term has link that can be used to expand/collapse its children terms, just like browse GO. The icon links to production of tree graph for current term (see below: The graphical functions). FDR-corrected p-value is displayed for terms. Here if it is below significance cutoff, it is colored by red. Following is term's name, and a link with text indicating number of annotated items in list 1. Upon clicked, a table containing information for the probesets that is annotated by this term is returned, like described before at browse GO.
Graphical functions - tree graph for one term Here, two graphical functions are provided, both were made to let you focus on your interested terms in the result. Now the term GO:0050896 "response to stimulus" has been expanded, and you are interested in its child term GO:0009628 "response to abiotic stimulus",

You can try by clicking the burst-shaped icon of GO:0009628:

This graph shows GO subtree induced by term GO:0009628. Term (nodes) are rectangles colored according to the term's FDR-corrected P values. Terms can be clicked to show probe sets annotated by this term as described before.
Graphical functions - horizontal bar plot for multiple terms Another graphical function in text mode is horizontal bar plot for a group of selected terms. You could draw an horizontal bar plot for multiple terms, and make visual comparison of functional enrichment between 2 sets. To do this, select one or more terms using the checkbox in front of each row, and click button named "Produce graph". Term names are displayed on vertical axis of the graph, and horizonal axis represents percentage of probesets annotated by the terms. Two bars exists for each term, the red one is list 1, and the green one is background (if list 2 has no input).

View result in graphical mode

This time we use the example list of ATH1 probe sets again. Return to the interface page, change option "Display style" to "Graphical", and redo the analysis, following image shall be produced:

When the number of terms in the graph exceeds a limit of 40, the terms are turned to tiny node for reducing picture size and easy of view. Please see an example graph below (the query list to generate this graph is example list 2 available from interface page.

By mouse over, the term account and name is displayed. Color of nodes still represent FDR-corrected P-value for the terms. The number in each oval represents mapping count of this term in query list. Two numbers will be displayed when two list comparison is issued.
The nodes are clickable to show the sub-tree related with the clicked term. For example when user clicked the node on down-left side with number 21, following tree will be produced:

Terms are represented by boxes as we have seen before. Each term can be clicked to see the list of query entries annotated by it.

Browse previously saved result using session ID

As noted before, the session ID is for re-browsing saved result in future. This is useful when network connection is slow or analysis job is big that it take long time. The session ID is provided both in text and graphical mode. Go to page Previous session and submit a valid session ID, corresponding analysis result will be brought to you in exact shape it was formed last time.

Usually all analysis results will stay on server for a long time as temporary files. Only when they grow to a large number (>2500) that will increase the risk of producing same session IDs for two different query, an empty operation will take place, which will be pre-declared in the main page 3 days before. If you want to access your result in a long time span, please send me an E-mail with the session ID and how long you want your results to stay on server.

This document last modified: 2008-08-30