Data Analysis

What to do first?

The first step would be to projext the complete variance that is present in your dataset. This is done by selecting "Analysis" in the main menu and select "Measurement QC" in "CA" (Correspondance Analysis). In this method every measurement gets mass. By doing so you do not superimpose any ordering in the data by grouping measurements in conditions.
Computing all genes over a high number of conditions is restricted since it takes quite a bit of computing power. If you would like to do that it's no problem - just let us know.
Select "HMS" in "CA" for performing analysis based on genewise median. This will results in a stronger seperation since, you already classify the data by grouping the hybs. together and taking the median.

What to feed?

For multichannel data it is advisable to feed ratios, since this is the most descriptive value. With multichannel data any given intensity of any gene is only valid with respect to the control-condition used.
If for instance, the mRNA level of gene A in the control as well as in condition1 is high, there will be an competition for the limited binding-sites on the slide. If using a different control condition in which the mRNA level (for gene A) is low, there will be almost no competition for binding site between control und condition1 (the majority of bindings sites will be occupied with mRNAs from condition1). This will result in an higher intensity value for geneA in condition1 even though the mRNA level of gene A has not changed!
For monochannel data feeding the fitted intensities is the best choise.
The next thing you are being asked is whether to feed linear or log values, here you should use logarithmic for ratios and linear for intensities.

Percentage of displayed Variance

In the next window you can see how much of the variance in the data is displayed in the first two and all the other dimensions. To make the data analyzable for the user a reduction of dimensionality is necessary. The variance that can be displayed in 2 (3) dimensions is in most cases less than the total variance. If you add up the first two columns you will have the percentage of variance that is displayed by a 2D-map. If this number is small ( eg. smaller than 85%) you might want to look at the 3D-plot of your data.
The number of dimensions that is needed to display 100% of the variance is atmost the number of columns of the data table minus one. Columns represent either measurements (hybridizations or channels) or experimental conditions if HMS is used.
Why is the dimensionality of the problem only dependent on the few columns (not on the numerous rows) and why "minus one"?

What do I see?

So now you have a map but what is shown there?:

	black dots:	genes
	colored squares:	experimental conditions (red always being control condition), the p/s next to the number stands for primary/secondary spots. The closer these are the more consistant your spots are. The colored square named M is the corresponding median of that condition.
	How to interpret their locations?
	colored lines:	Lines from the centroid to the condition medians in standard coordinates. These lines show the exact direction of association of genes with a particular condition: The standard coordinates of a condition are in the location of a (hypothetical) gene having highest possible association with this condition.

How to analyse?

Workflow

No use to start bottom-up by producing lists for each pair of conditions. Instead, save time by going top-down, i.e. start with the complete set of experiments:
Whole database using "Measurement QC" in "CA" - in case you didn't use the very same control condition for every hybridization, separately do "Measurement QC" for each subset having identical controls or talk to us.
Filter genes, using '>= max of medians of fitted intensity' only. Increase threshold until the conditions (colors) separate.
Identify outlying measurements and discard according hybridizations one by one. Redo "Measurement QC" after each disposal to check the effect. If this does not work well (discarding an outlier having not the desired effect):
Homogeneity of a condition is best checked by investigating this condition (in case of multichannel data plus the control condition) alone. Load only one condition (or discard the rest), re-filter, record (write down) outliers.
Filter thoroughly, using '>= max of medians of fitted intensity' at least discarding half of the genes, minmax- or std-separation up to the desired number of differential genes. Keep in mind that in nearly all cases except embryonic development, approx. 80% of the genes are off. And even if not, you cannot PCR-verify or even look through lists comprising thousands of genes!
Use "HMS" to further enhance inter-condition variability (over intra-condition, i.e. technical variability).
Select genes (e.g. into color-coded lists) to have a look at their function.
This was the first step (overview). Proceed into more detail: Either
- lower the filtering thresholds, e.g. if the majority of genes located in a particularly interesting direction are not annotated or so. Or
- further investigate a particular direction or e.g. the difference of two comparably similar but biologically interesting conditions by plotting the conditions alone without the other experiments.
- Systematically perform additional hybridizations where needed, i.e. where you want to increase resolution in particular. Reasons for additional hybridizations may be the wish to clearly separate similar clusters, bad quality (some conditions may turn out less homogenously measured than others) or special (biologically motivated) interest.

Get gene information

Once you see the map you have several option of what to do next. These options are written just above the map.
If you for instance wish to select a set of genes, it is a good idea to open a netscape browser, either on your local machine, or on the one you are logged in, because additional gene information will be displayed in the browser.
Next you will have to press left_mousebutton the the new available options are again displayed just above the plot (namely: select gene| zoom | select geneset). To select a set of genes you would have to click right_mousebutton and again you have new menu-options. Of these please choose fence, in the next menu you will place a fencepost in the map each time you click (left_button) in the map. You surround the desired set of genes by placing the fenceposts. Finally you click the middle-button (close fence) to close the fence.
Genes surrounded by it are selected and the available information of the genes will be displayed in a netscape browser. Also geneprofiles of the selected genes over all conditions will be shown in an extra matlab window. The information that shall be displayed when selecting genes can be altered in the 'show what' menu (View -> Settings -> Show what?).

Get experiment information

To make full use of the features M-CHiPS offers, you should use the select hybset function with this you can select a set of hybridizations. Due to the statistically analyzable annotations the values that are over- or underrepresented in this set will be displayed, click here for more information.

Details

Please find more information about e.g.

in the Frequently Asked Questions or in our publications.