A likelihood-based approach to defining statistical significance in proteomic analysis where missing data cannot be disregarded

Wood, John, White, Ian R. and Cutler, Paul (2004) A likelihood-based approach to defining statistical significance in proteomic analysis where missing data cannot be disregarded. Signal Processing, 84 (10). pp. 1777-1788.

Full text not available from this repository.


In several aspects of science it is important to be able to discriminate statistically relevant changes from background noise in complex data sets. Often this must be done where a significant proportion of the data may be missing for technical or operational issues. An example of such a challenge is proteomic data, where the aim is to assess differences in protein expression between groups, over several thousand proteins and across large sample sets. Such comparisons require firstly the digitisation of two-dimensional gel electrophoresis (2-DE) images followed by image analysis to create a matched table of thousands of quantitated spots, usually as density volumes. Often this is non-ideal as the analysis software fails to accurately detect, match or co-register spots across the data set resulting in incomplete representation of data points. This is important, as the so-called “missing data” cannot be ignored. It is not possible to say whether the spot data are missing as a result of being below the limit of detection, in which case a background value or similar nominal level can be assigned, or because of a failure to match, in which case such an interpolated value would be in error. By virtue of the fact that in proteomic analysis the assay is determining values for thousands of proteins simultaneously, there is the extra complication of “multiplicity”. Multiplicity reflects the fact that by measuring so many elements at once, some will appear to have altered expression purely as a result of chance. Statistical significance tests cannot therefore be considered definitive unless this issue is addressed. We describe the development of a statistical approach to data analysis for direct group comparisons, which deals with both missing data and multiplicity. Although the example given here is proteomic data, this novel approach could be used to compare the means of any two distributions where missing data can neither be disregarded nor set to zero, but where the probability that data will be missing rises as ‘true’ values get smaller.

Item Type: Article
Faculty \ School: Faculty of Science > School of Pharmacy
Depositing User: Rachel Smith
Date Deposited: 13 Jun 2011 15:17
Last Modified: 24 Oct 2022 02:49
URI: https://ueaeprints.uea.ac.uk/id/eprint/32413
DOI: 10.1016/j.sigpro.2004.06.019

Actions (login required)

View Item View Item