Functional data analysis
Data wikipedia, the free to: navigation, onal data analysis (fda) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. Contrast with other data may be so accurate that error can be ignored, may be subject to substantial measurement error, or even have a complex indirect relationship to the curve that they define. These curves are estimated, it is the assumption that they are intrinsically smooth that often defines a functional data analysis. Plots of first and second derivatives as functions of t, or plots of second derivative values as functions of first derivative values, may reveal important aspects of the processes generating the data. As a consequence, curve estimation methods designed to yield good derivative estimates can play a critical role in functional data st with other methods[edit]. Extensive use of kernel smoothing and smoothing splines to ensure smoothness assumptions signify why functional data analysis is in its core a nonparametric statistical technique. Nevertheless, models for functional data and methods for their analysis may resemble those for conventional multivariate data, including linear and nonlinear regression models, principal components analysis among others; that is because functional data can be thought as multivariate data with order on its dimensions. 1] but the possibility of using derivative information greatly extends the power of these methods, and also leads to purely functional models such as those defined by differential equations, often called dynamical onal principal component r reading[edit]. 2002) applied functional data analysis : methods and case studies, springer series in statistics, new york ; london : springer, isbn , j. Retrieved 13 may ries: statistical data typesstatistical logged intalkcontributionscreate accountlog pagecontentsfeatured contentcurrent eventsrandom articledonate to wikipediawikipedia out wikipediacommunity portalrecent changescontact links hererelated changesupload filespecial pagespermanent linkpage informationwikidata itemcite this a bookdownload as pdfprintable page was last edited on 11 june 2017, at 08: is available under the creative commons attribution-sharealike license;. Peer review onal data analysis (fda) is increasingly being used to better analyze, model and predict time series data. Key aspects of fda include the choice of smoothing technique, data reduction, adjustment for clustering, functional linear modeling and forecasting methods. Systematic review using 11 electronic databases was conducted to identify fda application studies published in the peer-review literature during 1995–2010. Of the published studies used functional linear models to describe relationships between explanatory and outcome variables and only 8. Used fda for forecasting time series e its clear benefits for analyzing time series data, full appreciation of the key features and value of fda have been limited to date, though the applications show its relevance to many public health and biomedical problems. Wider application of fda to all studies involving correlated measurements should allow better modeling of, and predictions from, such data in the future especially as fda makes no a priori age and time effects dsfunctional data analysissmoothingfunctional principal component analysisclusteringfunctional linear modelforecastingtime series ing and interpolation procedures can yield functional representations of a finite set of observations;. Is more natural to think through modeling problems in a functional form; objectives of an analysis can be functional in nature, as would be the case if finite data were used to estimate an entire function, its derivatives, or the values of other functionals. The fda approach is highly flexible in the sense that the timing intervals for data observations do not have to be equally spaced for all cases and can vary across cases. Although functional data themselves are not new, a new conceptualization of them has become necessary because of the increasing sophistication of available data collections [4]. Data collection technology has evolved over recent decades, allowing more dense sampling of observations over time, space, and other continuum measures. Such data are usually interpreted as reflecting the influence of certain smooth functions that are assumed to underlie and to generate the observations. Although classical multivariate statistical techniques are often applied to such data, they do not take advantage of additional information that could be implied by the smoothness of underlying functions. This represents a change in philosophy towards the handling of time series and correlated data [8]. Are a number of good illustrations of applications of fda; for example, ramsay and silverman [9, 10] using curves as data, locantore et al.
11] with images as data, and yushkevich and pizer [12] where the data points are shape representations of body parts. Application of fda has also been published across various scientific fields including analysis of child size evolution [9], climatic variation [4, 13], handwriting in chinese [14], acidification processes [15], land usage prediction based on satellite images [16], medical research [17–19], behavioral sciences [20], term-structured yield curves [21], and spectrometry data [22]. Most recently, ullah and finch [23] found fda to be an effective exploratory and modeling technique for highlighting trends and variations in the shape of the age–falls injury incidence relationship over contrast to most other methods commonly used to model trends in time series data, a key strength of the fda approach is that it makes no parametric assumptions about age or time effects. The fda methods for modeling and forecasting data across a range of health and demographic issues also have significant advantages for better understanding trends, risk factor relationships, and the effectiveness of preventive measures [24, 25]. In the book functional data analysis, ramsay and silverman [9] give an accessible overview of the foundations and applications of fda. In an earlier book entitled applied functional data analysis, the same authors [10] provide many examples that share the property of being functional forms of a continuous variable, most often age or time. In 2004, statistica sinica published a special issue that included two relevant review articles that dealt exclusively with the close connection between longitudinal data and functional data [26, 27]. In his phd thesis, ullah [28] described the significance and application of fda in demographic data settings. The application of fda is still relatively novel, especially to public health and biomedical data, this paper reviews applications of the approach to date with the aim of encouraging researchers to adopt fda in future studies. In doing so, it provides a summary of the extent to which fda has been applied in different fields, including an overview of the nature of the time series variables/data used. Of data via principal components analysis, which plays a key role in defining smoothness and continuity conditions of the resulting data;. Of data, which produces different functional groups (or clusters) for gaining more sophisticated knowledge of different pathways and/or functions for large scale data;. Linear models used for testing the effects on outcomes in functional form; sting via stochastic methods, to measure the forecast uncertainty through the estimation of a prediction sthis review was conducted according to the preferred reporting items for systematic reviews and meta-analyses (prisma) statement [29]. We conducted a systematic search of 11 electronic databases to identify peer-reviewed fda application studies published between january 1995 and december 2010. The databases used were academic search premier, sciencedirect, springerlink, cambridge journals, medge (informit), oxford journals, pubmed, sage journals online, web of science, wiley interscience journals, and medline. We used the phrase functional data analysis to identify relevant articles, and considered only english language articles published in peer-reviewed journals. In addition to the electronic database search, the search strategy included secondary searching of the reference lists of identified ion and exclusion s were eligible for inclusion if they were original research articles in peer-reviewed journals reporting an application of fda. These included reports of functional magnetic resonance imaging (fmri) to assess patterns of brain activation in patients suffering from chronic traumatic brain injury [30], functional performance in participants with functional ankle instability [31], and the relationship between neurocognitive function and noncontact anterior cruciate ligament injuries [32]. Search strategy used to identify 84 peer-review studies with published application of functional data analysis (fda). Figure 1 uses the prisma [29] flowchart to summarize all stages of the paper selection ew of the published fda 1 summarizes the final set of reviewed papers, and shows fields of application, outcome of interest, and use of the following important fda features:Use of functional principal component analysis (fpca);. Of application and the functional data analysis (fda) features used in the 84 peer-review papers reporting application of fda. Velocity on force platform---frm-[36]biomechanicskinematic gait datapolynomial splinefpca---[37]biomedicinediffusion tensor imaging fiber imageskernel--frm-[38]biomedicinegene expression microarray datalocal polynomialfpcafem--[39]biomedicinespinal cord dorsal horn neuronslocally weighted regression (loess)-k-means--[40]demographyage-specific mortality rateskernelfpca---[41]environmentgas emissionskernel----[42]geophysicsmagnetometerkernelfpca-frm-[43]medicinehuman growth-fpca---[44]medicineage-specific breast cancer mortality ratespenalized regression splinefpca--state space model[45]medicineage-specific fall injury incidence ratespenalized regression splinefpca--state space model[23]medicinehuman visionwavelet--fanova-[46]2009biologytemporal fertility trajectories of medfly-fpca-fmanova-[47]biomechanicskinematic gait data---frm-[48]biomedicine3-tesla magnetic resonance imaging data-fpca---[49]biomedicinedenaturing gradient gel electrophoresis datab-splinefpcahca--[50]biomedicinemicrorna transfection time-series microarray expression imagesb-splinefpca---[51]biomedicinepaediatric diffusion tensor imaging imagesb-splinefpca---[52]biomedicinepositron emission tomography time course datalocal polynomialfpca---[53]meteorologyclickstream web data (hurricane katrina)b-spline--fanova-[54]2008biomechanicsankle dorsiflexion, knee flexion, achilles tendon, calcaneal and leg abduction anglesroughness penaltyfpca---[55]biomedicinecolon carcinogenesis experimentsregression splinesfpca---[56]biomedicinediffusion tensor imaging fiber imagesb-splinefpca---[52]biomedicinetemporal gene expression profiles for the drosophila life cyclesmoothing splinefpca-frm-[57]biomedicinetime-course gene expression datab-spline-svm--[58]biomedicinetime-course gene expression datab-splinefpcalda, qda knn, svm--[59]demographymortality, fertility and migration ratesweighted penalized regression splinefpca--state space model[60]ecologyplankton monitoring dataroughness penaltyfpca---[61]environmentdiurnal ozone and nox cycles for transportation emission controlfourierfpcahca--[62]financecash flow and transactionswaveletfpca--far[63]financeprice formation and online auctionspolynomial spline--frm-[64]linguisticsspeech production variability in fricatives of children and adultsb-splinefpca---[65]meteorologyplasma biomarkerskernel--frm-[66]psychologyemotional responses of musical listenerscubic b-spline--fanova-[67]2007biologytime-course gene expression yeast cell cycleb-splinefpcambc--[68]biomedicinemri imagesb-spline----[69]demographymortality and fertility ratepenalized regression splinefpca--state space model[25]engineeringradar waveformskernel-hca--[70]environmentdiurnal ozone/nox cycles and transportation emissionsfourier--fanova-[71]environmentstratospheric ozone levelscubic splinefpca---[72]medicineage-specific breast cancer mortality ratesweighted local quadraticfpca--state space model[24]medicinewomen urinary hormone profiles at midlifecubic splinefpca---[73]medicinehaemoglobin levels in renal anaemiab-spline----[74]neurologyjoint coordination data in motor developmentb-splinefpca---[75]2006biologytime-course gene expression yeast cell cycleb-splinefpcaflr--[76]behaviouralmale medfly calling behaviour-fpca---[77]biomechanicskinematic gait data (knee flexion angle)cubic b-splinefpcalda--[78]biomechanicsknee joint kinematics in the vertical jumpb-splinefpca---[79]biomechanicskinematic gait data (sit to stand movements)b-spline----[80]ecologywater quality trend data (nutrient and sediment)fourierfpcahcafrm-[81]itsoftware complexity measuresmoothing spline----[82]linguisticstongue tip velocityb-spline----[83]physiologyblood lactate for running speed on a treadmillpolynomial splinefpca---[84]psychologytension judgement in musicb-spline----[85]2005biologyprotein expression profilesp-splinefpcahca--[86]biomechanicsjoint angles describing the limb motionregression splinefpca---[87]biomedicinefunctional magnetic resonance imaging data from 1. T allegra system-b-splinefpca---[89]ecologysmith mcintyre grab species-fpcahca--[90]educationtrends in mathematics and science achievement (timss) scorenonparametric splinefpcacart,knn--[91]financecash flows in point of sale and atm networksfourier--fanova-[92]linguisticsspeech movement recordswavelets----[93]psychologytension judgement in musicb-spline----[94]2004chemistrymolecular weight distributionsb-spline--frm-[95] medicineesophageal bolus flow-----[96] meteorologybiomarkerscubic splinefpca---[97] neurologyautomated atlas-based head size normalization-----[98] psychologymusical emotions and tensionb-spline----[99]2003biomechanicsdigitized images of hand drawing curves generated by subjects treated with various facial preparationfourierfpca-fanova-[100]biomedicinelongitudinal plasma folate dataweighted local polynomial splinefpca---[101]2002agriculturelodging score for rice fields based on a digital overhead imagefourier--frm-[102]biomedicinemyocardial contractile function imagescubic b-splinefpca---[103]economicsmonthly nondurable goods production indexb-spline----[104]medicinefoetal heart rate datafourier--frm-[18]medicinefoetal heart rate datafourier--flrm-[19]2001satelliteradar electromagnetic signalskernelfpca- -[105]2000biomechanicshandwriting in chineseb-splinefpcaedo--[14]linguisticsharmonics-to-noise ratio of voice signalsb-spline----[106]meteorologyannual cycle of sea surface temperaturespolynomial spline---far[107]1999linguisticsharmonics-to-noise ratio of voice signals-----[108]1998ecologyabundance of the gray-sided vole clethrionomys rufocanuslog-splinefpca---[109]1996linguisticsvocal tract lip motion during speechsmoothing splinefpca-fanova-[110]1995biomechanicsrecords of the force exerted by pinching a force meter with the tips of the thumb and forefinger an opposite sides-fpca---[111] economicsincome distribution-fpca---[112]. Functional principal component analysis; fem - functional embedding; hca - hierarchical cluster analysis; svm - support vector machine; lda- linear discriminant analysis; qda - quadratic discriminant analysis; knn- k-nearest neighbours; mbc - model based clustering; cart - classification and regression tree; edo - estimated differential operators; flm - functional linear model; frm - functional linear regression model; fanova - functional anova; fmanova - functional manova; fft - functional f test; flrm - functional logistic regression model; far - functional auto regressive earliest identified application of fda was in 1995 and 75% of the reviewed articles were published since 2005.
This reflects increasing recognition of the important features of functional data and awareness of the development of new statistical approaches and software for handling diverse fields were covered in the published studies, almost 21% of the studies related specifically to biomedical science (18 identified papers), followed by biomechanics applications (11 papers). In relation to specific health issues, the most common topics were analyses of kinematics gait data (9 papers), magnetic resonance imaging (6 papers), and yeast cell cycle temporal gene expression profiles (6 papers). The importance of each of these features is explained below and an overview given of how they were handled in the published ing is the first step in any fda, and its purpose is to convert raw discrete data points into a smoothly varying function. This emphasizes patterns in the data by minimizing short-term deviations due to observational errors, such as measurement errors or inherent system noise. Although some authors believe that fda can be considered as a smoothed version of multivariate data analysis, smoothing techniques should still be used to reduce some of the inherent randomness in the observed data [1, 25, 113]. And silverman [9] emphasize that the choice of smoothing technique is dependent upon the underlying behavior of the data being analyzed. Splines (regression splines, polynomial splines, b-spline) are typically chosen to represent noncyclical nonperiodic data [25, 51, 84], and wavelet bases are chosen to represent data displaying discontinuities and/or rapid changes in behavior [117, 118]. Most recently, ullah and finch [23] used constrained penalized regression splines with a monotonic constraint to represent their smooth curves of falls incidence fpca is one of the most popular multivariate analysis techniques for extracting information from functional data [119, 120]. This approach reduces the dimensions of a data set in which there are a large number of interrelated variables, while still holding as much of the total variation as possible. This reduction is obtained by transforming the data to a new set of variables, or principal components, that are uncorrelated and ordered so that the first few retain most of the variation present in all of the original use of fpca was reported in 51 (60. Many different applications of principal component analysis to functional data have been developed, including a useful extension of fpca that allows the estimation of harmonics from fragments of curves [122]. Although fpca is an important feature of fda, not all studies reported it because they did not undertake data reduction. 48] used a functional regression model to estimate the effects of gender, age, and walking speed on kinematic gait data; park et al. 58] classified gene functions using a support vector machine (clustering) for time-course gene expression data; and lucero [93] used only a b-spline to smooth the harmonics-to-noise ratio of voice signals. Clustering is one of the most frequently used techniques for partitioning a dataset into subgroups that contain instances that are similar to each other while being clearly dissimilar to those of other groups. In a functional context, clustering helps to identify representative curve patterns and individuals who are very likely to be involved in the same or similar processes. For example, in time-course microarray experiments, thousands of gene expression measures are taken over time [123] and an important problem is to discover functionally related genes that could then be the target for new gene regulatory networks or functional pathways. Clustering of data allows identification of groups of genes with similar expression patterns to identify such networks and pathways. Number of clustering methods were reported in the reviewed literature (table 1) and most of these were exploratory techniques for gene expression data. Other reported clustering methods were linear discriminant analysis (lda) (2 papers), k-nearest neighbors (knn) (3), support vector machine (svm) approaches (2), model-based clustering (mbc) (1), quadratic discriminant analysis (qda) (1) and estimated differential operators (edo) (1). 59] proposed knn to classify time-course gene expression profiles based on information from the data patterns. Recently, this method has received much attention in classification problems that arise with the analysis of microarray data [58, 59]. The mbc method assumes that the data are generated by a multivariate normal mixture distribution with appropriate means and covariance matrix [129]. 68] have applied this method of clustering time-course gene expression onal linear interesting application of fda involves the construction of functional models that describe the relation between an outcome variable and an explanatory variable.
In functional linear models, the functions could be the outcome or the predictors or the reviewed studies in table 1, 21 (25. When the outcome variable is in its functional form and the relationship is almost linear, the methodology is called functional linear regression model, or frm. 85] developed a functional f test (fft) for linear models with functional outcomes in their psychological study for measuring tension judgment in music. They illustrated how to apply the fft to longitudinal data where intrasubject repeated measures are viewed as discrete samples from an underlying curve with a continuous functional form. One study applied a functional logistic regression model (flrm) to fetal heart rates [18] and another applied functional multivariate analysis of variance (fmanova) to temporal fertility trajectories of medfly populations [47]. Recent introduction of stochastic methods for forecasting functional data has significant advantages over the standard approaches for better understanding trends, risk factor relationships, and the effectiveness of preventive measures. A major advantage of these methods is that they can measure forecast uncertainty through the estimation of prediction intervals for future data. A state space model was the most common approach for forecasting functional data in these studies (5 papers). Data analysis has greatly benefited from the development of fda methods and their application to time series data. This paper has summarized papers describing fda applications with a main emphasis on five popular features: smoothing, fpca, clustering, flm, and l, the published fda application studies demonstrate the value of this approach for exploring complex multivariate functional relationships and its major strength of being able to model the functional form of time series data. Different approaches allow for fda representations as smooth functions, and the published studies used a range of smoothing techniques for the estimation of discretely observed data. The fda approach of initially smoothing the data and then using the smoothed observations for modeling and forecasting is a major methodological improvement over methods that simply fit linear/non-linear trends to observed data. Although some authors believe that fda can be considered as a smoothed version of multivariate data analysis, recent work has shown the advantage of direct application of smoothing techniques to reduce some of the inherent randomness in the observed data [1, 25, 113]. Theoretical and practical developments that have occurred over recent years mean that researchers can now successfully apply fpca to many practical problems, with main attention given to the reduction of data dimensions to a finite level and identification of the most significant components of the data. High dimensional data significantly slow down conventional statistical algorithms and in some cases it is not feasible to use them in practice. Some studies need to compress their data to facilitate exploration of the most important features (e. It can also be used to investigate the variability of data with respect to individual curve shapes [131]. Of the major application areas highlighted in this review is an apparent increasing interest in clustering and classification techniques, especially for time-course gene expression data. Unlike conventional clustering that requires measuring multivariate data at the same time points to calculate euclidean ‘distances’ between observations, functional clustering can derive a broader class of distance measures even if the original measurements are not time-aligned among sampling units, as is common in public health applications. The reason for the popularity of functional clustering is that it can classify time series data into different classes without requiring a priori knowledge of data. Very interesting application of fda involves the construction of linear models that describe the relation between an outcome variable and explanatory variables with functional nature. The flms have recently gained popularity and the related literature has been steadily growing with several studies using covariates to explain functional variables. Reasons for not using flm techniques are unclear but might include a lack of knowledge about the value of building functional models for public health and biomedical data. However, the use of flm is not always necessary and depends on the specific research health researchers now recognize the importance of understanding trends in high dimensional time series data.
Somewhat surprisingly, the use of fda forecasting in public health and biomedical applications has been limited to sionin summary, this paper describes fda and its important features as applied to time series data from various fields. Functional data analysis provides a relatively novel modeling and prediction approach, with the potential for many significant applications across a range of public health and biomedical applications. Importantly, not all fda features always need to be used in a single study and the selection of specific analysis features will depend on the underlying behavior of the data, the nature of study and the specific research questions being posed. Consideration should be given to wider application of fda and its important modeling features so that more accurate estimates for public health and biomedical applications can be onal data onal principal component onal linear discriminant t vector based tic discriminant ted differential onal linear regression onal logistic regression onal multivariate analysis of ledgementsthe study was funded (at least in part) through the early career researcher development funding program at the university of ballarat. Mail: mueller@ words: autocovariance operator, clustering, covariance surface, eigenfunction, infinite-dimensional data, karhunen-lo\`eve representation, longitudinal data, nonparametrics, panel data, principal component, registration, regression,Smoothing, square integrable function, stochastic process, time course, tracking, onal data analysis (fda) refers to the statistical analysis samples consisting of random functions or surfaces, where on is viewed as one sample element. Typically, the ons contained in the sample are considered to be to correspond to smooth realizations of an underlying stochastic methodology then provides a statistical approach to the analysis edly observed stochastic processes or data generated by such differs from time series approaches, as the sampling design is very flexible, stationarity of ying process is not needed, and autoregressive-moving average models or similar time no role, except where the elements of such models are functions also differs from multivariate analysis, the area tics that deals with finite-dimensional random vectors, as functional data are te-dimensional and smoothness often is a central assumption. It is also a key methodology analysis of time course, image approaches and models of essentially nonparametric, allowing for flexible modeling. A distinction between smoothing methods and fda is that typically used in situations where one wishes to obtain an estimate for one (where objects here are functions or surfaces) from noisy observations, while fda the analysis of a sample of random objects,Which may be assumed to be completely observed without noise or to ly observed with noise; many scenarios of in between these important special situation arises when the processes generating the data are gaussian processes,An assumption that is often justify linear procedures and to simplify methodology and onal data are ubiquitous and may for example involve samples of ons \cp{knei:01}, hazard functions, or behavioral tracking ation areas that have been emphasized in the statistical literature include growth curves \cp{rao:58,Mull:84:2}, econometrics and e-commerce \cp{rams:02:2,jank:06:2},Biology \cp{kirk:89,izem:05}, and genetics and genomics \cp{opge:06,mull:08:3}. Also applies to panel data as considered ics and other social fda methods include functional principal component analysis. Basic ologies of anova, regression, correlation, classification ring that are available for scalar and vector data d analogous developments for functional data. An is that the time axis itself may be subject to tions and adequate functional models sometimes need to time-warping (also referred to as alignment or registration). These e careful modeling of the relationship between the ations and the assumed underlying functional trajectories. Spaghetti plot" to obtain an initial idea of functional shapes,To check for outliers and to identify potential "landmarks". Foundation for functional principal component analysis is en-lo\`eve representation of random functions \cp{karh:46,gren:50,gikh:69}. Employing smoothing methods (local least squares s) have been developed for various sampling schemes (sparse,Dense, with errors) to obtain a data-based version of -representation, where one regularizes by truncating at a $k$ of included components. The functional data are then represented by t-specific vectors of score estimates $\hxk,\, k=1,\ldots, k$,Which can be used to represent individual trajectories and uent statistical analysis \cp{mull:05:4}. A general relation between mixed linear models and onal models with basis expansion be used to advantage for modeling and implementation of these the theoretical analysis, one may distinguish between an essentially is, which results from assuming that the number of series terms is actually finite, leading tric rates of convergence, and an essentially functional approach. In the latter, the components is assumed to increase with sample size and this leads to "functional" convergence that depend on the properties of underlying processes, such as decay and spacing eigenvalues of the autocovariance operator. Functional regression and related onal regression models may include one or several the predictors, responses, or both. Class of useful functional regression models is large, due to te-dimensional nature of the functional predictors. The the response is functional \cp{rams:91} also is of le extensions functional linear model for example include ches \cp{ferr:06},Where unfavorable small ball probabilities and the non-existence of a density in impose limits on convergence \cp{mull:09:5},And multiple index models \cp{jame:05}. Of the functional linear model, which is also applicable for classification purposes, generalized functional linear model $e(y|x)=g\{\mu + \int_{\t} \,X(s)\beta(s)\,ds\}$ with link function $g$ \cp{jame:02,esca:04, card:05:1, mull:05:1}. Often discrete) distribution of $y$; the components of can be estimated by s discriminant analysis via the binomial functional generalized linear model,Various other methods have been studied onal clustering and discriminant analysis \cp{jame:03,chio:07,chio:08}. Practical relevance are extensions mial functional regression models \cp{mull:10:1}, hierarchical \cp{crai:09:1},Models with varying domains, models with more than one predictor function,And functional (autoregressive) time series models, among others.
In the functional trajectories themselves, derivatives are st to study the dynamics of the underlying processes \cp{rams:05}. Ble software fda package (r and matlab), at \:///misc/fda/ ,And the pace package (matlab), at http:///~mueller/data/ch supported in part by nsf grant on an article from lovric, miodrag (2011), opedia of statistical science. Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference.