|
The plot shows a correspondence analysis of a timecourse of yeast cells responding to 200mM hydrogengenperoxide in their medium.
But the yellow 30' condition is all divided into two parts or clusters
located far away from each other. To look into this let's have a look at an easy-to-explain example. Instead of using the chi-squared-test to determine, which annotation values are characteristic, let's consider a simple way to access these associations. There's a yeast-specific annotation 'growth phase', which can take one of three values: exponential, stationary or pseudo-hyphal. This is just to have an example. It could be any other annotation as well. So just think of the values as boxes, polygones and triangles and let's focuse on the triangles for a while. In the same way all the
annotation values (of all the annotations) are scanned for being
characteristic (that is over- or underrepresented) in the cluster by an
algorithm.
For our outlying cluster in practise that means that selecting
the cluster by mouseklicks yields the following list of values being more or
exactly two time over- or underrepresented:
Above way, artifacts like this one or also biologically relevant parameters driving the
cells to the observed transcription patterns can be extracted. Simple as it
may be, the described method already provides good analytical access to long lists of
annotations and huge sets of hybridizations, which could hardly be evaluated
by visual inspection. While this is a simple and easy to explain way to do
so, statistical tests would certainly better suit this task. Hovever, while
analysis may be trivial, it's the way of storage that
matters here: Any
statistical analysis will require that the variables (annotations) are of
categorical range and that instances of occurance can be counted for any
annotated value. Free text representation impairs direct statistical
analysis! Missepllings, different textual representations of
semantically identical items, and, vice versa, ambiguous words whose
meaning depends on the context, interfere with counting such values. With
these limitations to access for computer based, i.e. statistical analysis,
global studies of large data sets would not be possible.
|