Factor Analysis
Factor Analysis is a procedure that seeks to determine a reduced number of variables, called factors, that explain much of the variation present in a larger number of measured variables. statistiXL Provides a comprehensive module for Factor Analysis, with a variety of analytical options. Either the correlation or covariance matrix can be used in calculating the factors. The number of factors to be extracted can be established by several different criteria: 1) the number of factors can be chosen to encompass a specified percentage of he total variance in the original data, 2) you can choose to extract factors with eigenvalues greater than a set value, 3) you can extract a specific number of factors. A variety of methods are included for determining the factors including the principal component method (not to be confused with principal component analysis), principal factor method and maximum likelihood method. The axes of the resulting factors can then be rotated to improve the factor structure using either Varimax, Quartimax or Equamax procedures.
Results are presented in tabular form. Descriptive statistics and the correlation or covariance matrix are provided, if these options were selected.
Linear Regression
Linear regression attempts to explain the variation present in one variable (for example Height) in terms in terms of a linear relationship to variation in one or more predictor variables (for example Age). statistiXL provides comprehensive Model I regression analysis. The module allows the selection of one or more predictor variables for each single dependent variable. Options include forward or backwards stepwise regression (with P level to enter or remove), forcing of the relationship through the origin, and graphical output (normal probability plot, residuals plot, scatterplots).
Results from regression analysis are presented in tabular form and graphical form. Summary statistics are provided, if this option is selected. Statistics include the R², the correlation coefficient, the adjusted R², and the standard error of the estimate. An ANOVA table is given, to summarise the significance of the regression relationship.
Principal Component Analysis
As with Factor Analysis, Principal Component Analysis is a technique that attempts to reduce complex data sets consisting of many different variables to a smaller set of new variables that still manage to describe much of the variation in the original data. These new variables, called Principal Components, are chosen to be independent and to maximise the variance found in the original data set. statistiXL provides a number of options for Principal Component Analysis. Either the correlation or covariance matrix between variables can be selected as the basis for analysis. All Principal Components can be extracted or a subset of these based on limits such as the number to extract, the percent of variance to explain or the value of an eigenvalue. Screeplots can be produced to help in the visual determination of the appropriate number of Principal Components to extract.
Results are presented in tabular and graphical form. Descriptive statistics and the correlation or covariance matrix are displayed, if these options were selected. The eigenvalues are then tabulated along with the percent of variance and cumulative percent of total variance evident in the original dataset that each of the extracted Principal Components explains.
Analysis of Variance and Covariance
Analysis of Variance (ANOVA) is used to determine whether there is a difference between three or more categorical sets of values. Analysis of Covariance (ANCOVA) on the other hand, while also used to determine whether there is a difference between categorical sets of values also takes into account the effect of one or more numerical variables called covariates. StatistiXL provides both univariate and multivariate ANOVA and ANCOVA. Factors can be specified as fixed or random and the nesting of factors is also supported. Simplified dialog boxes aid the rapid analysis of full factorial andrepeated measures models, while for more advanced analyses a comprehensive dialog box is available that allows custom models to be specified precisely detailing the factors and interactions to be included in the analysis. Post Hoc Tests are provided so that you can drill down into your dataset and see what, if any, the major differences between groups are.
Results are presented in tabulated form, starting optionally with a table of simple descriptive statistics for each group.
Contingency Tables
A contingency table is a table of counts or frequencies. It lists the number of times that each of 2 or more variables falls into a variety of different categories. A contingency test simply examines the null hypothesis that the frequencies of observations found for one variable are independent of the frequencies of observations in the other. statistiXL provides a flexible module for analysis of contingency tables. Two-way and multi-way contingency tables can be analysed. The statistics available for the frequency test are Chi² and log-likelihood, the latter being a good alternative approach to Chi² if the expected frequencies are small. Yates’ and Cochran’s corrections for continuity are provided for 2x2 contingency where such an adjustment is recommended because of the low degrees of freedom. statistiXL explains how to subdivide a contingency table and warns of the limited statistical value of this approach. A Heterogeneity Chi² can be used in a contingency table analysis to determine if a number of sets of observed frequency data can be combined into a single set. statistiXL explains how to analyse contingency tables for heterogeneity.
Results are presented in tabulated form, starting with an optional table of a summary of the observed and expected frequencies. The test results are then presented, with the Chi² and Log-likelihood values, along with their degrees of Freedom and P values.
Descriptive Statistics
The descriptive statistics feature of statistiXL provides a quick and easy summary of the basic parametric and nonparametric statistics that describe a sample of values. Unlike many packages, statistiXL provides modules for both linear data and circular descriptive statistics. The descriptive statistics which can be provided (by user selection) with the linear descriptive statistics module are: mean, median, mode, standard error, standard deviation, variance, coefficient of variation, lower and upper confidence limits, 25th and 75th percentiles, sum, minimum and maximum values, nth smallest and largest values (with user input of n), range, count, skewness (with probability if count >9) and kurtosis (with probability if count >19). If these data are sampled at random from anormal distribution, then the data are best summarized by the parametric statistics such as mean, variance and standard deviation. If the data are sampled from a non-normal distribution, then the non-parametric statistics such as median, mode, percentiles and range may be more appropriate statistics for summarizing these data. Options for graphical output includes a “box and whisker” plot and an error bar plot.
A sample of values collected using a circular scale (e.g. time of day or compass bearing) can be described by a variety of descriptive statistics, the more common being the mean angle, angular variance and angular standard deviation. Optional graphical output includes a circular plot.
Nonparametric Statistics
Nonparametric statistical tests are distribution-independent tests that are used to analyse data for which an underlying distribution (such as the normal distribution) is not assumed. Non-parametric statistics have a number of advantages over parametric statistics.
statistiXL provides a diverse array of nonparametric tests. The sign test can be used to examine whether two populations have the same median, and for observations in pairs with one of each pair coming from each population. Various modifications of the sign test can be used for specific tests The Friedman test for blocked data is equivalent to a sign test, but for more than 2 groups. The Mann-Whitney test (U statistic) is a nonparametric test that uses the ranks of two independent samples, from the highest to lowest (or lowest to highest), to calculate the U statistic; it is the nonparametric analog to the parametric two-sample t-test. The Wilcoxon's signed-rank test ranks the differences between pairs of data (or single data set of a sample) and compares the sum of positive and negative ranks with a critical U value; it is the nonparametric analog to the parametric paired t test. The Kruskal-Wallis test is a nonparametric test for the comparison of 3 or more treatment groups, which are independent; it is the nonparametric equivalent to analysis of variance (ANOVA). A common nonparametric correlation test is Spearman’s rho rank correlation coefficients; this is analogous to the parametric Pearson’s correlation coefficient. Mood's Median Test examines whether two or more samples come from a population having the same median. The Wald-Wolfowitz Runs test analyses a sequence of observations, or compares random samples which are mutually independent, for two or more outcomes e.g. two species of antelope, or three brands of automobile. Results are presented in tabular form. |