All Terrain Thinking

A Compendium of things I think are Important

"If you teach a man to think he is thinking, he will love you. If you teach a man to think, he will hate you. - Ed McArthur"
 
 

Statistics: It's not just whats' in your wallet

STATISTICS: GLOSSARY

 

  • Data: The facts and figures that are collected, analyzed, presented, and interpreted. Data may be numeric or non numeric.
  • Data Set: All the data collected in a particular study.
  • Elements: The entities on which data are collected.
  • Variable: A characteristic of interest for the elements.
  • Observation: The set of measurements or data obtained for a single element.
  • Nominal Scale: A scale of measurement that uses a label or category to define an attribute of an element. Nominal data may be recorded with a non numeric description or with a numeric code.
  • Ordinal Scale: A scale of measurement that has the properties of a nominal scale and can be used to rank or order the observations. Ordinal data may be recorded with a non numeric description or with a numeric code.
  • Interval Scale: A scale of measurement that has the properties of an ordinal scale and the interval between observations is expressed in terms of a fixed unit of measure. Interval data are always numeric.
  • Ratio Scale: A scale of measurement that has the properties of an interval scale and the ratio of observations is meaningful. Ratio data are always numeric.
  • Qualitative Data: Data obtained with a nominal or ordinal scale of measurement. Qualitative data may be recorded with a non numeric description or with a numeric code.
  • Quantitative Data: Data obtained with an interval or ratio scale of measurement. Quantitative data are always numeric and indicate how much or how many for the variable of interest.
  • Descriptive Statistics: Tabular, graphical, and numerical methods used to summarize data.
  • Population: The collection of all elements of interest in a particular study.
  • Sample: A subset of the population.
  • Statistical Inference: The process of using data obtained from a sample to make estimates or test claims about the characteristics of a population.
  • Qualitative Data:, Data that provide labels or names for categories of like items. These data are provided by either a nominal or ordinal scale of measurement.
  • Quantitative Data: Data that indicate how much or how many. These data are provided by either an interval or ratio scale of measurement.
  • Frequency Distribution: A tabular summary of a set of data showing the frequency (or number) of items in each of several non overlapping classes.
  • Relative Frequency Distribution: A tabular summary of a set of data showing the relative frequency - that is, the fraction or proportion - of the total number of items in each of several non overlapping classes.
  • Bar Graph: A graphical device for depicting the information presented in a frequency distribution or relative frequency distribution of qualitative data.
  • Pie Chart: A pictorial device for presenting qualitative data summaries based upon subdividing a circle into sectors that correspond to the relative frequency for each class.
  • Histogram: A graphical presentation of a frequency distribution or relative frequency distribution of quantitative data constructed by placing the class intervals on the horizontal axis and the frequencies or relative frequencies on the vertical axis.
  • Cumulative Frequency Distribution: A tabular summary of a set of quantitative data showing the number of items having values less than or equal to the upper class limit of each class.
  • Cumulative Relative Frequency Distribution: A tabular summary of a set of quantitative data showing the fraction or proportion of the items having values less than or equal to the upper class limit of each class.
  • Class Midpoint: The point in each class that is halfway between the lower and upper class limits.
  • Exploratory Data Analysis: The use of simple arithmetic and easy-to-draw pictures to present data more effectively.
  • Population Parameter: A numerical value used as a summary measure for a population of data (e.g., the population mean, u, the population variance, s2, and the population standard deviation, s).
  • Sample Statistic: A numerical value used as a summary measure for a sample (e.g., the sample mean, x, the sample variance, s2, and the sample standard deviation, s).
  • Mean: A measure of central location for a data set. It is computed by summing all the data values and dividing by the number of items.
  • Trimmed Mean: The mean of the data remaining after a percent of the smallest and a percent of the largest items have been removed. The purpose of a trimmed mean is to provide a measure of central location that has eliminated the effect of extremely large and extremely small data values.
  • Median: A measure of central location. It is the value which splits the data into two equal groups - one with values greater than or equal to the median, and one with values less than or equal to the median.
  • Mode: A measure of location, defined as the most frequently occurring data value.
  • Percentile: A value such that at least p percent of the items are less than or equal to this value and at least (100 - p) percent of the items are greater than or equal to this value. The 50th percentile is the median.
  • Quartiles: The 25th, 50th, and 75th percentiles of the data referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide the data into four parts, with each part containing approximately 25% of the data.
  • Hinges: The value of the lower hinge is approximately the first quartile, or 25th percentile. The value of the upper hinge is approximately the third quartile, or 75th percentile. The values of the hinges and quartiles may differ slightly due to differing computational conventions.
  • Range: A measure of dispersion, defined to be the difference between the largest and smallest data values.
  • Interquartile Range: A measure of dispersion, defined to be the difference between the third and first quartiles.
  • Variance: A measure of dispersion for a data set, found by summing the squared deviations of the data values about the mean and then dividing the total by N if the data is from a population or by n - 1 if the data is from a sample.
  • Standard Deviation: A measure of dispersion for a data set, found by taking the positive square root of the variance.
  • Coefficient of Variation: A measure of relative dispersion for a data set, found by dividing the standard deviation by the mean and multiplying by 100.
  • z-score: For each data item, a value found by dividing the deviation about the mean (Xj - u) by the standard deviation s. A z-score is referred to as a standardized value and denotes the number of standard deviations a data value Xj is from the mean.
  • Chebyshev's Theorem: A theorem applying to any data set that can be used to make statements about the percentage of items that must be within a specified number of standard deviations of the mean.
  • Empirical Rule: A rule that states the percentages of items that are within one, two, and three standard deviations from the mean for mound-shaped, or bell-shaped, distributions.
  • Outlier: An unusually small or unusually large data value.
  • 5-Number Summary: An exploratory data analysis technique that uses the following 5 numbers to summarize the data set smallest value, first quartile, median, third quartile and largest value.
  • Grouped Data: Data available in class intervals as summarized by a frequency distribution. Individual values of the original data are not recorded.
  • Probability: A numerical measure of the likelihood that an event will occur.
  • Experiment: Any process which generates well defined outcomes.
  • Sample Space: The set of all possible sample points (experimental outcomes).
  • Sample Points: The individual outcomes of an experiment.
  • Tree Diagram: A graphical device helpful in defining sample points of an experiment involving multiple steps.
  • Classical Method A method of assigning probabilities which assumes that the experimental outcomes are equally likely.
  • Relative Frequency Method: A method of assigning probabilities based upon experimentation or historical data.
  • Subjective Method: A method of assigning probabilities based upon judgment.
  • Event: A collection of sample points.
  • Complement of Event: A The event containing all sample points that are not in A.
  • Venn Diagram: A graphical device for representing symbolically the sample space and operations involving events.
  • Union of Events A and B: The event containing all sample points that are in A, in B, or in both. The union is denoted A U B.
  • Intersection of A and B: The event containing all sample points that are in both A and B. The intersection is denoted A n B.
  • Addition Law: A probability law used to compute the probability of a union, P(A U B). It is P(A U B) = P(A) + P(B) - P(A n B). For mutually exclusive events, since P(A n B) - 0, it reduces to P(A U B) - P(A) + P(B).
  • Mutually Exclusive Events: Events that have no sample points in common; that is, A n B is empty and P(A n B) - 0.
  • Conditional Probability: The probability of an event given that another event has occurred. The conditional probability of A given B is P(A l B) - P(A n B)/P(B).
  • Independent events Two events A and B where P(A|B)- P(A)or P(BlA)-P(B); that is, the events have no influence on each other.
  • Multiplication Law: A probability law used to compute the probability of an intersection, P(A n B). It is P(A n B) - P(A)P(B|A) or P(A n B) = P(B)P(A|B). For independent events it reduces to P(A n B) = P(A )P(B).
  • Prior Probabilities: Initial estimates of the probabilities of events.
  • Posterior Probabilities: Revised probabilities of events based on additional information.
  • Bayes' Theorem: A method used to compute posterior probabilities.
  • Random Variable: A numerical description of the outcome of an experiment.
  • Discrete Random Variable: A random variable that can assume only a finite or infinite sequence of values.
  • Continuous Random Variable: A random variable that may assume all values in an interval or collection of intervals.
  • Probability Distribution: A description of how the probabilities are distributed over the values the random variable can take on.
  • Probability Function: A function, denoted by f(x), that for a discrete random variable, provides the probability that x takes on a particular value.
  • Expected Value: A measure of the mean, or central location, value of a random variable.
  • Variance: A measure of the dispersion, or variability, of a random variable.
  • Standard Deviation: The positive square root of the variance.
  • Binomial Experiment: A probability experiment possessing the four properties [independence, on/off variables]
  • Binomial Probability Distribution: A probability distribution showing the probability of x successes in n trials of a binomial experiment.
  • Binomial Probability Function: The function used to compute probabilities in a binomial experiment.
  • Poisson Probability Distribution: A probability distribution showing the probability of x occurrences of an event over a specified interval of time or space.
  • Poisson Probability Function: The function used to compute Poisson probabilities.
  • Hypergeometric Probability Function: The function used to compute the probability of x successes in n trials when the trials are dependent.
  • Uniform Probability Distribution: A continuous probability distribution where the probability that the random variable will assume a value in any interval of equal length is the same for each interval.
  • Probability Density Function: The function that defines the probability distribution of a continuous random variable.
  • Normal Probability Distribution: A continuous probability distribution. Its probability density function is bell shaped and determined by the mean and standard deviation.
  • Standard Normal Distribution: A normal distribution with a mean of 0 and a standard deviation of 1.
  • Continuity Correction Factor: A value of .S that is added and/or subtracted from a value of x when the continuous normal probability distribution is used to approximate the discrete binomial probability distribution.
  • Exponential Probability Distribution: A continuous probability distribution that is useful in computing probabilities for the time, or space, between occurrences of an event.
  • Parameter: A population characteristic, such as a population mean, a population standard deviation, a population proportion, and so on.
  • Simple Random Sampling
  • Finite Population: a sample selected such that each possible sample of size n has the same probability of being selected.
  • Infinite population: a sample selected such that each element comes from the same population and the successive elements are selected independently.
  • Sampling Without Replacement: Once an item from the population has been included in the sample it is removed from further consideration and thus cannot be selected a second time.
  • Sampling With Replacement: As each item is selected for the sample, it is returned to the population. It is possible that a previously selected item may be selected again and therefore appear in the sample more than once.
  • Sample Statistic: A sample characteristic, such as a sample mean, x, a sample standard deviation, s, a sample proportion, p, and so on. The value of the sample statistic is used to estimate the value of the population parameter.
  • Sampling Distribution: A probability distribution consisting of all possible values of a sample statistic.
  • Point Estimate: A single numerical value used as an estimate of a population parameter.
  • Point Estimator: The sample statistic, such as x, s, p, etc., that provides the point estimate of the population parameter.
  • Finite Population Correction Factor: The term [(N - n)/(N - l)]1/2 that is used in the formulas for standard deviations whenever a finite population, rather than an infinite population, is being sampled. The generally accepted rule of thumb is to ignore the finite population correction factor whenever n/N < .05.
  • Standard Error: The standard deviation of a point estimator.
  • Central Limit Theorem: A theorem that allows us to use the normal probability distribution to approximate the sampling distribution of x and p whenever the sample size is large.
  • Unbiasedness: A property of a point estimator that occurs whenever the expected value of the point estimator is equal to the population parameter it estimates.
  • Relative Efficiency: Given two unbiased point estimators of the same population parameter, the point estimator with the smaller variance is said to have greater relative efficiency than the other.
  • Consistency: A property of a point estimator that occurs whenever larger sample sizes tend to provide point estimates closer to the population parameter.
  • Probability Sample: A sample selected such that each element in the population has a known probability of being included in the sample. Simple random sampling, stratified simple random sampling, cluster sampling, and systematic sampling are probability samples.
  • Nonprobability Sample: A sample selected such that the probability of each element being included in the sample is unknown. Convenience and judgment samples are nonprobability samples.
  • Interval Estimate: An estimate of a population parameter that provides an interval of values believed to contain the value of the parameter.
  • Sampling Error: The magnitude of the difference between the value of an unbiased point estimator and the true population parameter. In this case of the mean, the sampling error is [ x - u]
  • Precision: A probability statement about the sampling error.
  • Confidence Level: The confidence associated with an interval estimate. For example, if an interval estimation procedure provides intervals such that 95% of the intervals developed will include the population parameter, an interval estimate is said to be constructed at the 95% confidence level; Note that .95 is referred to as the confidence coefficient.
  • t distribution: A family of probability distributions which can be used to develop interval estimates of a population mean whenever the population standard deviation is unknown and the population has a normal or near-normal probability distribution
  • Degrees of Freedom: A parameter of the t distribution. When the t distribution is used in the computation of an interval estimate of a population mean, the appropriate t distribution has n - 1 degrees of freedom, where n is the size of the simple random sample.
  • Null Hypothesis: The hypothesis tentatively assumed true in the hypothesis-testing procedure.
  • Alternative Hypothesis: The hypothesis concluded to be true if the null hypothesis is rejected.
  • Type I error: The error of rejecting Ho when it is true.
  • Type II error: The error of accepting Ho when it is false.
  • Critical Value: A value that is compared with the test statistic to determine whether or not Ho should be rejected.
  • Level of Significance: The maximum allowable probability of a Type I error.
  • One-tailed Test: A hypothesis test in which rejection of the null hypothesis occurs for values of the test statistic in one tail of the sampling distribution.
  • Two-tailed Test: A hypothesis test in which rejection of the null hypothesis occurs for values of the test statistic in either tail of the sampling distribution.
  • p-value: The probability, when the null hypothesis is true, of obtaining a sample result that is more unlikely than what is observed. It is often called the observed level of significance.
  • Power Curve; A graph of the probability of rejecting Ho for all possible values of the population parameter not satisfying the null hypothesis. The power curve provides the probability of correctly rejecting the null hypothesis.
  • Pooled Variance: An estimate of the variance of a population based on the combination of two (or more) sample results. The pooled variance estimate is appropriate whenever the variances of two (or more) populations are assumed equal.
  • Independent Samples: Samples selected from two (or more) populations where the elements making up one sample are chosen independently of the elements making up the other sample(s).
  • Matched Samples: Samples where each data value in one sample is matched with a corresponding data value in the other sample.
  • Analysis of Variance (ANOVA) Procedure: A statistical approach for determining whether or not the means of several different populations are equal.
  • Factor: Another word for the variable of interest in an ANOVA procedure.
  • Treatment Different levels of a factor.
  • Single-factor Experiment: An experiment involving only one factor with k populations or treatments.
  • Experimental Units: The objects of interest in the experiment.
  • Completely Randomized Design: An experimental design where the treatments are randomly assigned to the experimental units.
  • Mean Square: The sum of squares divided by its corresponding degrees of freedom. This quantity is used in the F ratio to determine if significant differences among means exist or not.
  • ANOVA Table: A table used to summarize the analysis of variance computations and results. It contains columns showing the source of variation, the degrees of freedom, the sum of squares, the mean squares, and the F values.
  • Partitioning: The process of allocating the total sum of squares and degrees of freedom into the various components.
  • Replication: The number of times each experimental condition is repeated in an experiment. It is the sample size associated with each treatment combination.
  • Interaction: The response produced when the levels of one factor interact with the levels of another factor in influencing the response variable.
  • Note: The definitions here are all stated with the understanding that simple linear
  • regression and correlation is being considered.
  • Independent Variable The variable that is being predicted or explained. It is denoted by y in the regression equation.
  • Independent Variable: The variable that is doing the predicting or explaining. It is denoted by x in the regression equation.
  • Simple Linear Regression: The simplest kind of regression, involving only two variables that are related approximately by a straight line.
  • Regression Equation: The mathematical equation relating the independent variable to the expected value of the dependent variable; that is, E( y) =B0 +B1x.
  • Estimated Regression Equation: The estimate of the regression equation obtained by the least squares method; i.e., y = b0 +b1x.
  • Scatter Diagram: A graph of the available data in which the independent variable appears on the horizontal axis and the dependent variable appears on the vertical axis.
  • Least Squares Method: The approach used to develop the estimated regression equation which minimizes the sum of squared residuals.
  • Coefficient of Determination (r2): A measure of the variation explained by the estimated regression equation. It is a measure of how well the estimated regression equation fits the data.
  • Deterministic model A relationship between an independent variable and a dependent variable whereby specifying the value of the independent variable allows one to compute exactly the value of the dependent variable.
  • Probabilistic Model: A relationship between an independent variable and a dependent variable in which specifying the value of the independent variable is not sufficient to allow determination of the value of the dependent variable.
  • Residual: The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation; i.e. y - yi.
  • Standardized Residual: The value obtained by dividing the residual by its standard deviation .
  • Sample Correlation Coefficient (rxy): A statistical measure of the linear association between two variables.
  • Multiple Regression Model: A regression model in which more than one independent variable is used to predict the dependent variable.
  • Multiple Coefficient of Determination (R2): A measure of the goodness of fit for the estimated regression equation.
  • Adjusted Multiple Coefficient of Determination (R2): A measure of the goodness of fit for the estimated regression equation which accounts for the number of independent variables.
  • Multicollinearity: A term used to describe the case when the independent variables in a multiple regression model are correlated.
  • General Linear Model: A model of the form y = B0 +B1x1 +B2x2 + B3x3 +e
  • where each of the independent variables xj is a function of x, the variables for which data has been collected.
  • Interaction: The joint effect of two variables acting together.
  • Qualitative Variable: A variable that is not measured in terms of how much or how many but instead is assigned values to represent categories.
  • Dummy Variable: A variable that takes on the values 0 or 1 and is used to incorporate the effects of qualitative variables in a regression model
  • Variable-selection Procedures: Computer-based methods for selecting a subset of the potential independent variables for a regression model
  • Outlier: An observation with a residual that is far greater in magnitude than the rest of the residual values.
  • Influential Observation: An observation that has a great deal of influence in determining the estimated regression equation.
  • Leverage: A measure designed to indicate how far an observation is from the others in terms of the values of the independent variables.
  • Autocorrelation: Correlation in the errors that arises when the error terms at successive points in time are related. First-order autocorrelation is when et and et-l are related second-order is when et-l and et-2 are related and so on.
  • Serial correlation: Same as autocorrelation.
  • Durbin-Watson Test: A test to determine whether or not first-order
  • Time Series: A set of observations measured at successive points in time or over successive periods of time.
  • Forecast: A projection or prediction of future values of a time series.
  • Trend: The long-run shift or movement in the time series observable over several periods of data.
  • Cyclical Component: The component of the time series model that results in periodic above-trend and below-trend behavior of the time series lasting more than 1 year.
  • Seasona1 Component: The component of the time series model that shows a periodic pattern over 1 year or less.
  • Irregular Component: The component of the time series model that reflects the random variation of the actual time series values beyond what can be explained by the trend cyclical and seasonal components.
  • Moving Averages: A method of forecasting or smoothing a time series by averaging each successive group of data points. The moving averages method can be used to isolate the seasonal component of the time series.
  • Mean Squared Error (MSE): One approach to measuring the accuracy of a forecasting model. This measure is the average of the sum of the squared difference between the forecast values and the actual time series values.
  • Weighted Moving Averages: A method of forecasting or smoothing a time series by computing a weighted average of past data values. The sum of the weights must equal one.
  • Exponential Smoothing: A forecasting technique that uses a weighted average of past time series values in order to arrive at smoothed time series values which can be used as forecasts.
  • Smoothing Constant: A parameter of the exponential-smoothing model which provides the weight given to the most recent time series value in the calculation of the forecast value.
  • Multiplicative Time-series Model: A model that assumes that the separate components of the time series can be multiplied together to identify the actual time series value. When the 4 components of trend cyclical seasonal and irregular are assumed present we obtain: Yl = T*C* S* I. When cyclical is not modeled we obtain: Y = T*I.
  • Deseasonalized Time-series: A time series that has had the effect of season removed by dividing each original time series observation by the corresponding seasonal index.
  • Causal Forecasting Method: Forecasting methods that relate a time series to other variables that are believed to explain or cause its behavior.
  • Autoregressive Model: A time series model that uses a regression relationship based on past time series values to predict the future time series values.
  • Delphi Approach: A qualitative forecasting method that obtains forecasts through a group consensus."
  • Scenario Writing: A qualitative forecasting method which consists of developing a conceptual scenario of the future based upon a well defined set of assumptions

 

 

Valid HTML 4.01 Transitional
 

Add to Your Social Bookmarks: -

Visitors Map
several several several Site Map - Press Room - Privacy Policy - Disclaimer
Copyright © 1998-2012 eMcArthur unless otherwise indicated
Unauthorized duplication or publication of any materials from this Site is expressly prohibited.
    Hosting by IPower!