To retrieving information, or signals, from a massive, redundant data stream one needs data reduction. With data reduction, we try to extract the essential information and ignore the noise or redundant information. Methods for this come from common sense (``anything before 2005 is uninteresting'') and from statistical learning, data mining, image processing. When statistics-based techniques are used, it is essential to get a feeling what, in mathematical terms, was meant by the words ``redundant'', and ``noise''.
When interfacing a large data stream with many channels (themes) and high temporal frequency, at some stage your resources (disk space) will limit what you can store. Two obvious solutions to this problem are
- limiting to how long back you want to keep data, or
- limiting the number of themes you want to store/offer
- limiting the frequency (e.g. store from hourly images only one image every 12 hours, etc.)
However, at this moment in time you may not know which data streams you will ultimately need, and up to which time you will need them. In order to keep storage limited, you need to select. The task is now to select "typical" rather than arbitrary images, but what is "typical"? For typical one may think of:
- the image closest to the mean in the batch of images (but how do you define this?)
- the two images furthest from the mean (range) (but how to define what makes an image extreme?)
- the two images that could be seen as the first and third quantile (but how to define this?)
- the images with the largest and smallest autocorrelation (most noisy and most smooth)
write a filter that replaces a batch (e.g. 24, or 24*30) images with a small number (1-5) of typical images selected from this batch.
A generalisation of selection filters is that of a temporal filter. Temporal filters compute new, non-existing images based on a stack of existing images. Examples are
- the mean image (pixel-wise mean value)
- the variance image
- the quantile images (examples: minimum, first quartile, median, third quartile, maximum)
- the range image, the inter quartile range image
write an automated temporal filter that can deal with a large set of images.
- low-pass filters
- high-pass filters, edge detection
- object identifying filters, e.g. by spatial clustering/segmentation algorithms
- classification, e.g. identify clouds
write software for automatic spatial filtering.
- n images of principle components scores for the first n (2,3,4) principle components
- n images resulting from a min-max-autocorrelation factor (MAF) or Minimum Noise Fraction filter (principle components, but looks for spatially structured signals)
- k images resulting from a k-means cluster analysis
- classification of spatial time series
write a filter that replaces a batch with PCA or MNF images; evaluate how well the original set of information could be reproduced from these images.
- 24 Sep 2009