Appendix A Descriptive Statistics BASIC DATA ANALYSIS Cumulative Frequency Distributions MEASURES OF LOCATION AND SPREAD Parameters versus Statistics Center and Location Variation MULTIVARIATE VARIABLES AND DISTRIBUTIONS Frequencies Marginal Distributions Graphical Representation Conditional Distribution Independence Covariance Correlation Contingency Coefficient Appendix B Continuous Probability Distributions Commonly Used in Financial Econometrics NORMAL DISTRIBUTION Properties of the Normal Distribution CHI-SQUARE DISTRIBUTION STUDENT’S t-DISTRIBUTION F -DISTRIBUTION α-STABLE DISTRIBUTION Appendix C Inferential Statistics POINT ESTIMATORS Sample, Statistic, and Estimator Quality Criteria of Estimators Large-Sample Criteria CONFIDENCE INTERVALS Confidence Level and Confidence interval HYPOTHESIS TESTING Hypotheses Error Types Test Size Descriptive Statistics 327 Moreover, whether the class bounds are elements of the classes or not must be specied. The class bounds of a class have to be bounds of the respective adjacent classes as well, such that the classes seamlessly cover the entire data. The width should be the same for all classes. However, if there are areas where the data are very intensely dense in contrast to areas of lesser density, then the class width can vary according to signicant changes in value density. In certain cases, most of the data are relatively evenly scat- tered within some range while there are extreme values that are located in isolated areas on either end of the data array. Then, it is sometimes advis- able to specify no lower bound to the lowest class and no upper bound to the uppermost class. Classes of this sort are called open classes. Moreover, one should consider the precision of the data as they are given. If values are rounded to the rst decimal place but there is the chance that the exact value might vary within half a decimal about the value given, class bounds have to consider this lack of certainty by admitting half a decimal on either end of the class. Cumulative Frequency Distributions In contrast to the empirical cumulative frequency distributions, in this sec- tion we will introduce functions that convey basically the same information, that is, the frequency distribution, but rely on a few more assumptions. These cumulative frequency distributions introduced here, however, should not be confused with the theoretical denitions given in probability theory in the next appendix, even though one will clearly notice that the notion is akin to both. The absolute cumulative frequency at each class bound states how many observations have been counted up to this particular class bound. However, we do not exactly know how the data are distributed within the classes. On the other hand, when relative frequencies are used, the cumulative relative frequency distribution states the overall proportion of all values up to a certain lower or upper bound of some class. So far, things are not much different from the denition of the empirical cumulative frequency distribution and empirical cumulative relative fre- quency distribution. At each bound, the empirical cumulative frequency distribution and cumulative frequency coincide. However, an additional assumption is made regarding the distribution of the values between bounds of each class when computing the cumulative frequency distribution. The data are thought of as being continuously distributed and equally spread between the particular bounds. Hence, both forms of the cumulative fre- quency distributions increase in a linear fashion between the two class bounds. So for both forms of cumulative distribution functions, one can compute the accumulated frequencies at values inside of classes. 328 The Basics of financial economeTrics For a more thorough analysis of this, let’s use a more formal presenta- tion. Let I denote the set of all class indexes i with i being some integer value between 1 and =nI I (i.e., the number of classes). Moreover, let a j and f j denote the (absolute) frequency and relative frequency of some class j, respectively. The cumulative frequency distribution at some upper bound, x u i , of a given class i is computed as ∑∑ ==+ ≤≤ Fx aa a() u i j jx x ji jx x:: u j u i u j l i (A.1) In words, this means that we sum up the frequencies of all classes in which the upper bound is less than x u i plus the frequency of class i itself. The cor- responding cumulative relative frequency distribution at the same value is then, ∑∑ ==+ ≤≤ Fx ff f() f u i jj i jx xjx x :: u j l i u j u i (A.2) This describes the same procedure as in equation (A.1) using relative frequencies instead of frequencies. For any value x in between the boundar- ies of, say, class i, x l i and x u i , the cumulative relative frequency distribution is dened by =+ − − Fx Fx xx xx f() () ff l i l i u i l i i (A.3) In words, this means that we compute the cumulative relative frequency dis- tribution at value x as the sum of two things. First, we take the cumulative relative frequency distribution at the lower bound of class i. Second, we add that share of the relative frequency of class i that is determined by the part of the whole interval of class i that is covered by x. MEASURES OF LOCATION AND SPREAD Once we have the data at our disposal, we now want to retrieve key num- bers conveying specic information about the data. As key numbers we will introduce measures for the center and location of the data as well as mea- sures for the spread of the data. Parameters versus Statistics Before we go further, we have to introduce a distinction that is valid for any type of data. We have to be aware of whether we are analyzing the entire Descriptive Statistics 329 population or just a sample from that population. The key numbers when dealing with populations are called parameters, while we refer to statis- tics when we observe only a sample. Parameters are commonly denoted by Greek letters while statistics are usually assigned Roman letters. The difference between these two measures is that parameters are valid values for the entire population or universe of data and, hence, remain con- stant throughout whereas statistics may vary with every different sample even though they each are selected from the very same population. This is easily understood using the following example. Consider the average return of all stocks listed in the S&P 500 index during a particular year. This quan- tity is a parameter µ, for example, since it represents all these stocks. If one randomly selects 10 stocks included in the S&P 500, however, one may end up with an average return for this sample that deviates from the popula- tion average, µ. The reason would be that by chance one has picked stocks that do not represent the population very well. For example, one might by chance select the top 10 performing stocks included in the S&P 500. Their returns will yield an average (statistic) that is above the average of all 500 stocks (parameter). The opposite analog arises if one had picked the 10 worst performers. In general, deviations of the statistics from the parameters are the result of one selecting the sample. Center and Location The measures we present rst are those revealing the center and the location of the data. The center and location are expressed by three different mea- sures: mean, mode, and median. The mean is the quantity given by the sum of all values divided by the size of the data set. The size is the number of values or observations. The mode is the value that occurs most often in a data set. If the distribution of some population or the empirical distribution of some sample are known, the mode can be determined to be the value corresponding to the highest frequency. Roughly speaking, the median divides data by value into a lower half and an upper half. A more rigorous denition for the median is that we require that at least half of the data are no greater and at least half of the data are no smaller than the median itself. The interpretation of the mean is as follows: the mean gives an indica- tion as to which value the data are scattered about. Moreover, on average, one has to expect a data value equal to the mean when selecting an observa- tion at random. However, one incurs some loss of information that is not insignicant. Given a certain data size, a particular mean can be obtained from different values. One extreme would be that all values are equal to the mean. The other extreme could be that half of the observations are 330 The Basics of financial economeTrics extremely to the left and half of the observations are extremely to the right of the mean, thus, leveling out, on average. Of the three measures of central tendency, the mode is the measure with the greatest loss of information. It simply states which value occurs most often and reveals no further insight. This is the reason why the mean and median enjoy greater use in descriptive statistics. While the mean is sensitive to changes in the data set, the mode is absolutely invariant as long as the maximum frequency is obtained by the same value. The mode, however, is of importance, as will be seen, in the context of the shape of the distribu- tion of data. A positive feature of the mode is that it is applicable to all data levels. Variation Rather than measures of the center or one single location, we now discuss measures that capture the way the data are spread either in absolute terms or relative terms to some reference value such as, for example, a measure of location. Hence, the measures introduced here are measures of varia- tion. We may be given the average return, for example, of a selection of stocks during some period. However, the average value alone is incapable of providing us with information about the variation in returns. Hence, it is insufcient for a more profound insight into the data. Like almost everything in real life, the individual returns will most likely deviate from this reference value, at least to some extent. This is due to the fact that the driving force behind each individual object will cause it to assume a value for some respective attribute that is inclined more or less in some direction away from the standard. While there are a great number of measures of variation that have been proposed in the nance literature, we limit our coverage to those that are more commonly used in nancial econometrics—absolute deviation, stan- dard deviation (variance), and skewness. Absolute Deviation The mean absolute deviation (MAD) is the average devia- tion of all data from some reference value (which is usually a measure of the center). The deviation is usually measured from the mean. The MAD measure takes into consideration every data value. Variance and Standard Deviation The variance is the measure of variation used most often. It is an extension of the MAD in that it averages not only the absolute but the squared deviations. The deviations are measured from the mean. The square has the effect that larger deviations contribute even more to the measure than smaller deviations as would be the case with the MAD. Descriptive Statistics 331 This is of particular interest if deviations from the mean are more harmful the larger they are. In the conext of the variance, one often speaks of the averaged squared deviations as a risk measure. The sample variance is dened by ∑ =− = s n xx 1 () i i n 22 1 (A.4) using the sample mean. If, in equation (A.4) we use the divisor n – 1 rather than just n, we obtain the corrected sample variance. Related to the variance is the even more commonly stated measure of variation, the standard deviation. The reason is that the units of the stan- dard deviation correspond to the original units of the data whereas the units are squared in the case of the variance. The standard deviation is dened to be the positive square root of the variance. Formally, the sample standard deviation is ∑ = − − = s n xx 1 1 () i i n 2 1 (A.5) Skewness The last measure of variation we describe is skewness. There exist several denitions for this measure. The Pearson skewness is dened as three times the difference between the median and the mean divided by the standard deviation. 4 Formally, the Pearson skewness for a sample is = − s mx s 3( ) P d where m denotes the median. As can be easily seen, for symmetrically distributed data, skewness is zero. For data with the mean being different from the median and, hence, located in either the left or the right half of the data, the data are skewed. If the mean is in the left half, the data are skewed to the left (or left skewe) since there are more extreme values on the left side compared to the right side. The opposite (i.e., skewed to the right, or right skewed), is true for data whose mean is further to the right than the median. In contrast to the MAD and variance, the skewness can obtain positive as well as negative values. 4 To be more precise, this is only one of Pearson’s skewness coefcients. Another one not presented here employs the mode instead of the mean. 332 The Basics of financial economeTrics This is because, not only is some absolute deviation of interest, but the direc- tion is as well. MULTIVARIATE VARIABLES AND DISTRIBUTIONS Thus far in this appendix, we examined one variable only. However, for applications of nancial econometrics, there is typically less of a need to analyze one variable in isolation. Instead, a typical problem is to investigate the common behavior of several variables and joint occurrences of events. In other words, there is the need to establish joint frequency distributions and introduce measures determining the extent of dependence between variables. Frequencies As in the single variable case, we rst gather all joint observations of our variables of interest. For a better overview of occurrences of the variables, it might be helpful to set up a table with rows indicating observations and columns representing the different variables. This table is called the table of observations. Thus, the cell of, say, row i and column j contains the value that observation i has with respect to variable j. Let us express this rela- tionship between observations and variables a little more formally by some functional representation. In the following, we will restrict ourselves to observations of pairs, that is, k = 2. In this case, the observations are bivariate variables of the form x = (x 1 ,x 2 ). The rst component x 1 assumes values in the set V of possible values while the second component x 2 takes values in W, that is, the set of possible values for the second component. Consider the Dow Jones Industrial Average over some period, say one month (roughly 22 trading days). The index includes the stock of 30 com- panies. The corresponding table of observations could then, for example, list the roughly 22 observation dates in the columns and the individual com- pany names row-wise. So, in each column, we have the stock prices of all constituent stocks at a specic date. If we single out a particular row, we have narrowed the observation down to one component of the joint obser- vation at that specic day. Since we are not so much interested in each particular observation’s value with respect to the different variables, we condense the information to the degree where we can just tell how often certain variables have occurred. 5 In other words, we are interested in the frequencies of all possible pairs with 5 This is reasonable whenever the components assume certain values more than once. Descriptive Statistics 333 all possible combinations of rst and second components. The task is to set up the so-called joint frequency distribution. The absolute joint frequency of the components x and y is the number of occurrences counted of the pair (v,w). The relative joint frequency distribution is obtained by dividing the absolute frequency by the number of observations. While joint frequency distributions exist for all data levels, one distin- guishes between qualitative data, on the one hand, and rank and quantita- tive data, on the other hand, when referring to the table displaying the joint frequency distribution. For qualitative (nominal scale) data, the correspond- ing table is called a contingency table whereas the table for rank (ordinal) scale and quantitative data is called a correlation table. Marginal Distributions Observing bivariate data, one might be interested in only one particular component. In this case, the joint frequency in the contingency or correla- tion table can be aggregated to produce the univariate distribution of the one variable of interest. In other words, the joint frequencies are projected into the frequency dimension of that particular component. This distribu- tion so obtained is called the marginal distribution. The marginal distri- bution treats the data as if only the one component was observed while a detailed joint distribution in connection with the other component is of no interest. The frequency of certain values of the component of interest is meas- ured by the marginal frequency. For example, to obtain the marginal fre- quency of the rst component whose values v are represented by the rows of the contingency or correlation table, we add up all joint frequencies in that particular row, say i. Thus, we obtain the row sum as the marginal frequency of this component v i . That is, for each value v i , we sum the joint frequencies over all pairs (v i , w j ) where v i is held xed. To obtain the marginal frequency of the second component whose val- ues w are represented by the columns, for each value w j , we add up the joint frequencies of that particular column j to obtain the column sum. This time we sum over all pairs (v i , w j ) keeping w j xed. Graphical Representation A common graphical tool used with bivariate data arrays is given by the so-called scatter diagram or scatter plot. In this diagram, the values of each pair are displayed. Along the horizontal axis, usually the values of the rst component are displayed while along the vertical axis, the values of the second component are displayed. The scatter plot is helpful in visualizing 334 The Basics of financial economeTrics FIGURE A.1 Scatter Plot: Extreme 1—No Relationship of Component Variables x and y x y whether the variation of one component variable somehow affects the vari- ation of the other. If, for example, the points in the scatter plot are dispersed all over in no discernible pattern, the variability of each component may be unaffected by the other. This is visualized in Figure A.1. The other extreme is given if there is a functional relationship between the two variables. Here, two cases are depicted. In Figure A.2, the relation- ship is linear whereas in Figure A.3, the relationship is of some higher order. 6 When two (or more) variables are observed at a certain point in time, one speaks of cross-sectional analysis. In contrast, analyzing one and the same variable at different points in time, one refers to it as time series analysis. We will come back to the analysis of various aspects of joint behavior in more detail later. Figure A.4 shows bivariate monthly return data of the S&P 500 stock index and the GE stock for the period January 1996 to December 2003 (96 observation pairs). We plot the pairs of returns such that the GE returns are the horizontal components while the index returns are the vertical com- ponents. By observing the plot, we can roughly assess, at rst, that there 6 As a matter of fact, in Figure A.2, we have y = 0.3 + 1.2x. In Figure A.3, we have y = 0.2 + x3. Descriptive Statistics 335 FIGURE A.2 Scatter Plot: Extreme 2—Perfect Linear Relationship between Component Variables x and y x y FIGURE A.3 Scatter Plot: Extreme 3 —Perfect Cubic Functional Relationship between Component Variables x and y x y 336 The Basics of financial economeTrics −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 GE S&P 500 FIGURE A.4 Scatter Plot of Monthly S&P 500 Stock Index Returns versus Monthly GE Stock Returns appears to be no distinct structure in the joint behavior of the data. How- ever, by looking a little bit more thoroughly, one might detect a slight lin- ear relationship underlying the two returns series. That is, the observations appear to move around some invisible line starting from the bottom left corner and advancing to the top right corner. This would appear quite rea- sonable since one might expect some link between the GE stock and the overall index. Conditional Distribution With the marginal distribution as previously dened, we obtain the fre- quency of component x at a certain value v, for example. We treat variable x as if variable y did not exist and we only observed x. Hence, the sum of the marginal frequencies of x has to be equal to one. The same is true in the converse case for variable y. Looking at the contingency or correlation table, the joint frequency at the xed value v of the component x may vary in the values w of component y. Then, there appears to be some kind of inuence Descriptive Statistics 337 of component y on the occurrence of value v of component x. The inuence, as will be shown later, is mutual. Hence, one is interested in the distribution of one component given a certain value for the other component. This dis- tribution is called the conditional frequency distribution. The conditional relative frequency of x conditional on w is dened by ==fvfxw fvw fw () () (, ) () xw xy y | , (A.6) The conditional relative frequency of y on v is dened analogously. In equation (A.6), both commonly used versions of the notations for the conditional frequency are given on the left side. The right side, that is, the denition of the conditional relative frequency, uses the joint frequency at v and w divided by the marginal frequency of y at w. The use of conditional distributions reduces the original space to a subset determined by the value of the conditioning variable. If in equation (A.6) we sum over all possible values v, we obtain the marginal distribution of y at the value w, f y (w), in the numerator of the expression on the right side. This is equal to the denominator. Thus, the sum over all conditional relative frequencies of x conditional on w is one. Hence, the cumulative relative frequency of x at the largest value x can obtain, conditional on some value w of y, has to be equal to one. The equivalence for values of y conditional on some value of x is true as well. Analogous to univariate distributions, it is possible to compute mea- sures of center and location for conditional distributions. Independence The previous discussion raised the issue that a component may have inu- ence on the occurrence of values of the other component. This can be ana- lyzed by comparing the joint frequencies of x and y with the value in one component xed, say x = v. If these frequencies vary for different values of y, then the occurrence of values x is not independent of the value of y. It is equivalent to check whether a certain value of x occurs more frequently given a certain value of y, that is, check the conditional frequency of x con- ditional on y, and compare this conditional frequency with the marginal frequency at this particular value of x. The formal denition of independence is if for all v,w fvwfvfw xy xy , (, )( )( )= ⋅ (A.7) 338 The Basics of financial economeTrics That is, for any pair (v, w), the joint frequency is the mathematical product of their respective marginals. By the denition of the conditional frequen- cies, we can state an equivalent denition as in the following: ==fv fv w fvw fw () (| ) (, ) () x xy y , (A.8) which, in the case of independence of x and y, has to hold for all values v and w. Conversely, an equation equivalent to (A.8) has to be true for the marginal frequency of y, f y (w), at any value w. In general, if one can nd one pair (v, w) where either equations (A.7) or (A.8) and, hence, both do not hold, then x and y are dependent. So, it is fairly easy to show that x and y are dependent by simply nding a pair violating equations (A.7) and (A.8). Now we show that the concept of inuence of x on values of y is analo- gous. Thus, the feature of statistical dependence of two variables is mutual. This will be shown in a brief formal way by the following. Suppose that the frequency of the values of x depends on the values of y, in particular, 7 ≠= fv fvw fw fv w() (, ) () (| ) x xy y , (A.9) Multiplying each side of equation (A.9) by f y (w) yields ≠⋅fvwfvfw(, )( )( ) xy xy , (A.10) which is just the denition of dependence. Dividing each side of equation (A.10) by >fv() 0 x gives =≠ fvw fv fw vfw (, ) () (| )( ) xy x y , showing that the values of y depend on x. Conversely, one can demonstrate the mutuality of the dependence of the components. Covariance In this bivariate context, there is a measure of joint variation for quantita- tive data. It is the (sample) covariance dened by ∑ == −− = sxy n xx yy cov( ,) 1 () () xy ii i n , 1 (A.11) 7 This holds provided that f y (w) > 0. Descriptive Statistics 339 In equation (A.11), for each observation, the deviation of the rst com- ponent from its mean is multiplied by the deviation of the second compo- nent from its mean. The sample covariance is then the average of all joint deviations. Some tedious calculations lead to an equivalent representation of equation (A.11) ∑ ==− = sxy n vw xycov( ,) 1 xy ii i n , 1 which is a transformation analogous to the one already presented for variances. The covariance of independent variables is equal to zero. The converse, however, is not generally true; that is, one cannot automatically conclude independence from zero covariance. This statement is one of the most important results in statistics and probability theory. Technically, if the covariance of x and y is zero, the two variables are said to be uncorrelated. For any value of cov(x,y) different from zero, the variables are correlated. Since two variables with zero covariance are uncorrelated but not automati- cally independent, it is obvious that independence is a stricter criterion than no correlation. 8 This concept is exhibited in Figure A.5. In the plot, the two sets repre- senting correlated and uncorrelated variables are separated by the dashed line. Inside of the dashed line, we have uncorrelated variables while the correlated variables are outside. Now, as we can see by the dotted line, the set of independent variables is completely contained within the dashed oval of uncorrelated variables. The complementary set outside the dotted circle (i.e., the dependent variables) contains all of the correlated as well as part of the uncorrelated variables. Since the dotted circle is completely inside of the dashed oval, we see that independence is a stricter requirement than uncorrelatedness. The concept behind Figure A.5 of zero covariance with dependence can be demonstrated by a simple example. Consider two hypothetical securities, x and y, with the payoff pattern given in Table A.2. In the left column below y, we have the payoff values of security y while in the top row we have the payoff values of security x. Inside of the table are the joint frequencies of the pairs (x,y). As we can see, each particular value of x occurs in combination with only one particular value of y. Thus, the two variables (i.e., the payoffs of x and y) are dependent. We compute the means of the two variables to be 8 The reason is founded in the fact that the terms in the sum of the covariance can cancel out each other even though the variables are not independent. 340 The Basics of financial economeTrics =x 0 and =y 0, respectively. The resulting sample covariance according to equation (A.11) is then s XY, …=− − () ++−− ⋅⋅ 1 3 7 6 010 1 3 11 6 0110 0− () = which indicates zero correlation. Note that despite the fact that the two vari- ables are obviously dependent, the joint occurrence of the individual values is such that, according to the covariance, there is no relationship apparent. Correlation If the covariance of two variables is non-zero we know that, formally, the variables are dependent. However, the degree of correlation is not uniquely determined. FIGURE A.5 Relationship between Correlation and Dependence of Bivariate Variables cov(x,y) = 0 x,y Dependent cov(x,y) ≠ 0 x,y Independent TABLE A.2 Payoff Table of the Hypothetical Variables x and y with Joint Frequencies x y 7/6 13/6 –⅚ –11/6 1 ⅓ –2 ⅙ 2 ⅙ –1 ⅓ Descriptive Statistics 341 This problem is apparent from the following illustration. Suppose we have two variables, x and y, with a cov(x, y) of a certain value. A linear trans- formation of, at least, one variable, say ax + b, will generally lead to a change in value of the covariance due to the following property of the covariance: +=ax by ax ycov( ,) cov( ,) This does not mean, however, that the transformed variable is more or less correlated with y than x was. Since the covariance is obviously sensitive to transformation, it is not a reasonable measure to express the degree of cor- relation. This shortcoming of the covariance can be circumvented by dividing the joint variation as dened by equation (A.11) by the product of the respective variations of the component variables. The resulting measure is the Pearson correlation coefcient or simply the correlation coefcient dened by r xy ss xy xy , cov( ,) = ⋅ (A.12) where the covariance is divided by the product of the standard deviations of x and y. By denition, r x,y can take on any value from –1 to 1 for any bivari- ate quantitative data. Hence, we can compare different data with respect to the correlation coefcient equation (A.12). Generally, we make the distinc- tion r x,y < 0, negative correlation; r x,y = 0, no correlation; and r x,y > 0, posi- tive correlation to indicate the possible direction of joint behavior. In contrast to the covariance, the correlation coefcient is invariant with respect to linear transformation. That is, it is said to be scaling invariant. For example, if we translate x to ax + b, we still have raxbys saxy as ax by ax by++ =+ = ⋅ , cov( ,)/( )cov(, )/ xxy xy sr ⋅ = , Contingency Coefficient So far, we could only determine the correlation of quantitative data. To extend this analysis to any type of data, we introduce another measure, the so-called chi-square test statistic denoted by χ 2 . Using relative frequencies, the chi-square test statistic is dened by ∑∑ χ= − == n fvwfvf w fvfw ((,) () () ) ()() xy ij xiyj xiyj j s i r 2 , 2 11 (A.13) An analogous formula can be used for absolute frequencies. 342 The Basics of financial economeTrics The intuition behind equation (A.13) is to measure the average squared deviations of the joint frequencies from what they would be in case of inde- pendence. When the components are, in fact, independent, then the chi-square test statistic is zero. However, in any other case, we have the problem that, again, we cannot make an unambiguous statement to compare different data sets. The values of the chi-square test statistic depend on the data size n. For increasing n, the statistic can grow beyond any bound such that there is no theoretical maximum. The solution to this problem is given by the Pearson contingency coefcient or simply contingency coefcient dened by = χ +χ C n 2 2 (A.14) The contingency coefcient by the denition given in equation (A.14) is such that 0 ≤ C < 1. Consequently, it assumes values that are strictly less than one but may become arbitrarily close to one. This is still not satisfac- tory for our purpose to design a measure that can uniquely determine the respective degrees of dependence of different data sets. There is another coefcient that can be used based on the following. Sup- pose we have bivariate data in which the value set of the rst component variable contains r different values and the value set of the second component variable contains s different values. In the extreme case of total dependence of x and y, each variable will assume a certain value if and only if the other vari- able assumes a particular corresponding value. Hence, we have k = min{r,s} unique pairs that occur with positive frequency whereas any other combina- tion does not occur at all (i.e., has zero frequency). Then one can show that = − C k k 1 such that, generally, ≤≤ −< Ck k 0( 1) /1 . Now, the standardized coefcient can be given by = − C k k C 1 corr (A.15) which is called the corrected contingency coefcient with 0 ≤ C ≤ 1. With the measures given in equations (A.13), (A.14), and (A.15), and the corrected contingency coefcient, we can determine the degree of dependence for any type of data. 343 APPENDIX B Continuous Probability Distributions Commonly Used in Financial Econometrics I n this appendix, we discuss the more commonly used continuous probability distributions are used in nancial econometrics. The four distributions discussed are the normal distribution, the chi-square distri- bution, the Student’s t-distribution, the Fisher’s F-distribution. It should be emphasized that although many of these distributions enjoy wide- spread attention in nancial econometrics as well as nancial theory (e.g., the normal distribution), due to their well-known characteristics or mathematical simplicity, the use of some of them might be ill-suited to replicate the real-world behavior of nancial returns. In particular, the four distributions just mentioned are appealing in nature because of their mathematical simplicity, due to the observed behavior of many quantities in nance, there is a need for more exible distributions com- pared to keeping models mathematically simple. For example, although the Student’s t-distribution that will be discussed in this appendix is able to mimic some behavior inherent in nancial data such as so-called fat tails or heavy tails (which means that a lot of the probability mass is attributed to extreme values), 1 it fails to capture other observed behavior such as skewness. For this reason, there has been increased interest in a continuous probability distribution in nance and nancial econometrics known as the α-stable distribution. We will describe this distribution at the end of this appendix. 1 There are various characterizations of fat tails in the literature. In nance, typically the tails that are heavier than those of the exponential distribution are considered “heavy.” 344 The Basics of financial economeTrics NORMAL DISTRIBUTION The rst distribution we discuss is the normal distribution. It is the distri- bution most commonly used in nance despite its many limitations. This distribution, also referred to as the Gaussian distribution, is characterized by the two parameters: mean (μ) and standard deviation (σ). The distribution is denoted by N(μ, σ 2 ). When μ = 0 and σ 2 = 1, then we obtain the standard normal distribution. The density function for the normal distribution is given by fx e x ()= ⋅ − − () 1 2 2 2 2 πσ µ σ (B.1) The density function is symmetric about μ. A plot of the density function for several parameter values is given in Figure B.1. As can be seen, the value of μ results in a horizontal shift from 0 while σ inates or deates the graph. A characteristic of the normal distribution is that the densities are bell shaped. FIG URE B.1 Normal Density Function for Various Parameter Values −3 −2 −1 0 1 2 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x μ = 0, σ = 1 μ = 1, σ = 1 μ = 0, σ = 0.5 μ = 0, σ = 2 Standard normal distribution f (x) Continuous Probability Distributions Commonly Used in Financial Econometrics 345 A problem is that the distribution function cannot be solved for analyti- cally and therefore has to be approximated numerically. In the particular case of the standard normal distribution, the values are tabulated. Standard statistical software provides the values for the standard normal distribution as well as most of the distributions presented in this chapter. The standard normal distribution is commonly denoted by the Greek letter Φ such that we have Φ= =≤ xFxP Xx () () () , for some standard normal random variable X. In Figure B.2, graphs of the distribution function are given for three dif- ferent sets of parameters. Properties of the Normal Distribution The normal distribution provides one of the most important classes of prob- ability distributions due to two appealing properties: Property 1. The distribution is location-scale invariant. That is, if X has a normal distribution, then for every constant a and b, aX + b is again a normal random variable. FIGURE B.2 Normal Distribution Function for Various Parameter Values −3 −2 −1 0 1 2 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x Standard normal distribution F(x) μ = 0, σ = 1 μ = 0, σ = 0.5 μ = 0, σ = 2 346 The Basics of financial economeTrics Property 2. The distribution is stable under summation. That is, if X has a normal distribution F, and X 1 , . . . , X n are n independent random variables with distribution F, then X 1 + . . . + X n is again a normal distributed random variable. In fact, if a random variable X has a distribution satisfying Properties 1 and 2 and X has a nite variance, then X has a normal distribution. Property 1, the location-scale invariance property, guarantees that we may multiply X by b and add a where a and b are any real numbers. Then, the resulting a + b ⋅ X is, again, normally distributed, more pre- cisely, N (a + μ, bσ). Consequently, a normal random variable will still be normally distributed if we change the units of measurement. The change into a + b ⋅ X can be interpreted as observing the same X, however, mea- sured in a different scale. In particular, if a and b are such that the mean and variance of the resulting a + b ⋅ X are 0 and 1, respectively, then a + b ⋅ X is called the standardization of X. Property 2, stability under summation, ensures that the sum of an arbi- trary number n of normal random variables, X 1 , X 2 , . . . , X n is, again, nor- mally distributed provided that the random variables behave independently of each other. This is important for aggregating quantities. Furthermore, the normal distribution is often mentioned in the context of the central limit theorem. It states that a sum of n random variables with nite variance and identical distributions and being independent of each other, converges in distribution to a normal random variable. 2 We restate this formally as follows: Let X 1 , X 2 , . . . , X n be identically distributed random variables with mean E(X i ) = μ and var(X i ) = σ 2 and do not inuence the outcome of each other (i.e., are independent). Then, we have Xn n N i i n D − → ⋅ = ∑ µ σ 1 01(,) (B.2) as the number n approaches innity. The D above the convergence arrow in equation (B.2) indicates that the distribution function of the left expression convergences to the standard normal distribution. Generally, for n = 30 in equation (B.2), we consider equality of the dis- tributions; that is, the left-hand side is N(0,1) distributed. In certain cases, depending on the distribution of the X i and the corresponding parameter 2 There exist generalizations such that the distributions need no longer be identical. However, this is beyond the scope of this appendix. Continuous Probability Distributions Commonly Used in Financial Econometrics 347 values, n < 30 justies the use of the standard normal distribution for the left-hand side of equation (B.2). These properties make the normal distribution the most popular dis- tribution in nance. This popularity is somewhat contentious, however, for reasons that will be given when we describe the α-stable distribution. The last property we will discuss of the normal distribution that is shared with some other distributions is the bell shape of the density func- tion. This particular shape helps in roughly assessing the dispersion of the distribution due to a rule of thumb commonly referred to as the empirical rule. Due to this rule, we have ∈µ±σ =µ+σ −µ−σ ≈PX FF([ ]) ()()68% ∈µ±σ =µ+σ−µ−σ≈PX FF([2])(2) (2) 95% ∈µ±σ =µ+σ−µ−σ≈PX FF([3])(3) (3) 100% The above states that approximately 68% of the probability is given to values that lie in an interval one standard deviation σ about the mean μ. About 95% probability is given to values within 2σ to the mean, while nearly all probability is assigned to values within 3σ from the mean. CHI-SQUARE DISTRIBUTION Our next distribution is the chi-square distribution. Let Z be a standard normal random variable, in brief Z ~ N (0,1), and let X = Z 2 . Then X is distributed chi-square with one degree of freedom. We denote this as X ~ χ 2 (1). The degrees of freedom indicate how many independently behaving standard normal random variables the resulting variable is composed of. Here X is just composed of one, namely Z, and therefore has one degree of freedom. Because Z is squared, the chi-square distributed random variable assumes only nonnegative values; that is, the support is on the nonnegative real numbers. It has mean E(X) = 1 and variance var(X) = 2. In general, the chi-square distribution is characterized by the degrees of freedom n, which assume the values 1, 2, . . . , and so on. Let X 1 , X 2 , . . . , X n be n χ 2 (1) distributed random variables that are all independent of each other. Then their sum, S, is ∑ =χ = SX n i i n 2 1 (B.3) 348 The Basics of financial economeTrics In words, the sum is again distributed chi-square but this time with n degrees of freedom. The corresponding mean is E(X) = n, and the variance equals var(X) = 2 · n. So, the mean and variance are directly related to the degrees of freedom. From the relationship in equation (B.3), we see that the degrees of free- dom equal the number of independent χ 2 (1) distributed X i in the sum. If we have two independent random variables X 1 ~ χ 2 (n 1 ) and X 2 ~ χ 2 (n 2 ), it follows that +χ+XX nn 12 2 12 (B.4) From equation (B.4), we have that chi-square distributions have Prop- erty 2; that is, they are stable under summation in the sense that the sum of any two independent chi-squared distributed random variables is itself chi-square distributed. We won’t present the chi-squared distribution’s density function here. However, Figure B.3 shows a few examples of the plot of the chi-square density function with varying degrees of freedom. As can be observed, the chi-square distribution is skewed to the right. FIGURE B.3 Density Functions of Chi-Square Distributions for Various Degrees of Freedom n 0 5 10 15 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x n = 1 n = 2 n = 5 n = 10 f(x) Continuous Probability Distributions Commonly Used in Financial Econometrics 349 STUDENT’S t-DISTRIBUTION An important continuous probability distribution when the population variance of a distribution is unknown is the Student’s t-distribution (also referred to as the t-distribution and Student’s distribution). To derive the distribution, let X be distributed standard normal, that is, X ~ N(0,1), and S be chi-square distributed with n degrees of freedom, that is, S ~ χ 2 (n). Furthermore, if X and S are independent of each other, then Z X Sn tn= / ~() (B.5) In words, equation (B.5) states that the resulting random variable Z is Stu- dent’s t-distributed with n degrees of freedom. The degrees of freedom are inherited from the chi-square distribution of S. Here is how we can interpret equation (B.5). Suppose we have a popula- tion of normally distributed values with zero mean. The corresponding nor- mal random variable may be denoted as X. If one also knows the standard deviation of X, σ= var () X with X/σ, we obtain a standard normal random variable. However, if σ is not known, we instead have to use, for example, =⋅++…Sn nX X/1/( ) n 1 22 where XX ,, n 1 22 … are n random variables identically distributed as X. Moreover, X 1 , . . . , X n have to assume values independently of each other. Then, the distribution of XSn // is the t-distribution with n degrees of freedom, that is, XSntn//~() By dividing by σ or S/n, we generate rescaled random variables that follow a standardized distribution. Quantities similar to XSn // play an important role in parameter estimation. It is unnecessary to provide the complicated formula for the Student’s t-distribution’s density function here. Basically, the density function of the 350 The Basics of financial economeTrics Student’s t-distribution has a similar shape to the normal distribution, but with thicker tails. For large degrees of freedom n, the Student’s t-distribution does not signicantly differ from the standard normal distribution. As a matter of fact, for n ≥ 50, it is practically indistinguishable from N(0,1). Figure B.4 shows the Student’s t-density function for various degrees of freedom plotted against the standard normal density function. The same is done for the distribution function in Figure B.5. In general, the lower the degrees of freedom, the heavier the tails of the distribution, making extreme outcomes much more likely than for greater degrees of freedom or, in the limit, the normal distribution. This can be seen by the distribution function that we depicted in Figure B.5 for n = 1 and n = 5 against the standard normal cumulative distribution function (cdf). For lower degrees of freedom such as n = 1, the solid curve starts to rise earlier and approach 1 later than for higher degrees of freedom such as n = 5 or the N(0,1) case. This can be understood as follows. When we rescale X by dividing by Sn / as in equation (B.5), the resulting XSn // obviously inherits ran- domness from both X and S. Now, when S is composed of few X i , only, say n = 3, such that XSn // has three degrees of freedom, there is a lot of FIGURE B.4 Density Function of the t-Distribution for Various Degrees of Freedom n Compared to the Standard Normal Density Function N(0,1) −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 x n = 1 n = 5 N(0,1) f (x) Continuous Probability Distributions Commonly Used in Financial Econometrics 351 dispersion from S relative to the standard normal distribution. By includ- ing more independent N(0,1) random variables X i such that the degrees of freedom increase, S becomes less dispersed. Thus, much uncertainty relative to the standard normal distribution stemming from the denomina- tor in XS n // vanishes. The share of randomness in XSn // originating from X alone prevails such that the normal characteristics preponderate. Finally, as n goes to innity, we have something that is nearly standard nor- mally distributed. The mean of the Student’s t random variable is zero, that is E(X) = 0, while the variance is a function of the degrees of freedom n as follows σ 2 2 == − var()X n n For n = 1 and 2, there is no nite variance. Distributions with such small degrees of freedom generate extreme movements quite frequently relative to higher degrees of freedom. Precisely for this reason, stock price returns are often found to be modeled quite well using distributions with small degrees of freedom, or alternatively, distributions with heavy tails with power decay, with power parameter less than 6. FIG URE B.5 Distribution Function of the t-Distribution for Various Degrees of Free- dom n Compared to the Standard Normal Density Function N(0,1) −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x F(x) t(1) t(5) N(0,1) 352 The Basics of financial economeTrics F -DISTRIBUTION Our next distribution is the F-distribution. It is dened as follows. Let χ Xn~() 2 1 and χ Yn ~(). 2 2 Furthermore, assuming X and Y to be independent, then the ratio =Fn n X n Y n (, ) 12 1 2 (B.6) has an F-distribution with n 1 and n 2 degrees of freedom inherited from the underlying chi-square distributions of X and Y, respectively. We see that the random variable in equation (B.6) assumes nonnegative values only because neither X nor Y are ever negative. Hence, the support is on the nonnega- tive real numbers. Also like the chi-square distribution, the F-distribution is skewed to the right. Once again, it is unnecessary to present the formula for the density function. Figure B.6 displays the density function for various degrees of freedom. As the degrees of freedom n 1 and n 2 increase, the function graph becomes more peaked and less asymmetric while the tails lose mass. FIG URE B.6 Density Function of the F-Distribution for Various Degrees of Freedom n 1 and n 2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x f (x) n 1 = 4, n 2 = 4 n 1 = 4, n 2 = 10 n 1 = 10, n 2 = 4 n 1 = 10, n 2 = 100 Continuous Probability Distributions Commonly Used in Financial Econometrics 353 The mean is given by = − >EX n n n() 2 fo r2 2 2 2 (B.7) while the variance equals σ 2 2 2 12 12 2 2 22 24 == +− −− var()X nn n nn n () ()() for nn 2 4> (B.8) Note that according to equation (B.7), the mean is not affected by the degrees of freedom n 1 of the rst chi-square random variable, while the variance in equation (B.8) is inuenced by the degrees of freedom of both random variables. α-STABLE DISTRIBUTION While many models in nance have been modeled historically using the nor- mal distribution based on its pleasant tractability, concerns have been raised that this distribution underestimates the danger of downturns of extreme magnitude in stock markets that have been observed in nancial markets. Many distributional alternatives providing more realistic chances to severe price movements have been presented earlier, such as the Student’s t. In the early 1960s, Benoit Mandelbrot suggested as a distribution for commodity price changes the class of Lévy stable distributions (simply referred to as the stable distributions). 3 The reason is that, through their particular param- eterization, they are capable of modeling moderate scenarios, as supported by the normal distribution, as well as extreme ones. The stable distribution is characterized by the four parameters α, β, σ, and μ. In brief, we denote the stable distribution by S(α, β, σ, μ). Parameter α is the so called tail index or characteristic exponent. It determines how much probability is assigned around the center and the tails of the distribution. The lower the value α, the more pointed about the center is the density and the heavier are the tails. These two features are referred to as excess kurtosis relative to the normal distribution. This can be visualized graphically as we have done in Figure B.7 where we compare the normal density to an α-stable 3 Benoit B. Mandelbrot, “The Variation of Certain Speculative Prices,” Journal of Business 36 (1963): 394–419. 354 The Basics of financial economeTrics density with a low α = 1.5. 4 The density graphs are obtained by tting the distributions to the same sample data of arbitrarily generated numbers. The parameter α is related to the parameter ξ of the Pareto distribution resulting in the tails of the density functions of α-stable random variables to vanish at a rate proportional to the Pareto tail. The tails of the Pareto as well as the α-stable distribution decay at a rate with xed power α, Cx –α (i.e., power law) where C is a positive constant, which is in contrast to the normal distribution whose tails decay at an expo- nential rate (i.e., roughly −− xe x 1/ 2 2 ). The parameter β indicates skewness where negative values represent left skewness while positive values indicate right skewness. The scale parameter σ has a similar interpretation to the standard deviation. Finally, the param- eter μ indicates location of the distribution. Its interpretability depends on the parameter α. If the latter is between 1 and 2, then μ is equal to the mean. 4 In the gure, the parameters for the normal distribution are μ = 0.14 and σ = 4.23. The parameters for the stable distribution are α = 1.5, β = 0, σ = 1, and μ = 0. Note that symbols common to both distributions have different meanings. FIGURE B.7 Comparison of the Normal (Dash-Dotted) and α-Stable (Solid) Density Functions −15 −10 −5 0 5 10 15 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 f (x) x Higher peak Heavier tails Excess kurtosis: Continuous Probability Distributions Commonly Used in Financial Econometrics 355 Possible values of the parameters are listed below: α (0,2] β [–1,1] σ (0,∞) μ any real number Depending on the parameters α and β, the distribution has either support on the entire real line or only the part extending to the right of some location. In general, the density function is not explicitly presentable. Instead, the distribution of the α-stable random variable is given by its characteristic function which we do not present here. 5 Figure B.8 shows the effect of α on tail thickness of the density as well as peakedness at the origin relative to the normal distribution (collectively 5 There are three possible ways to uniquely dene a probability distribution: the cumulative distribution function, the probability density function, and the char- acteristic function. The precise denition of a characteristics function needs some advanced mathematical concepts and is not of major interest for this book. At this point, we just state the fact that knowing the characteristic function is mathemati- cally equivalent to knowing the probability density or the cumulative distribution function. In only three cases does the density of a stable distribution have a closed- form expression. FIGURE B.8 Inuence of α on the Resulting Stable Distribution −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 β = 0, σ = 1, μ = 0 α = 0.5 α = 1 α = 1.5 α = 2 f ( x ) x 356 The Basics of financial economeTrics the “kurtosis”of the density), for the case of β = 0, μ = 0, and σ = 1. As the values of α decrease, the distribution exhibits fatter tails and more peaked- ness at the origin. Figure B.9 illustrates the inuence of β on the skewness of the density function for α = 1.5, μ = 0, and σ = 1. Increasing (decreasing) values of β result in skewness to the right (left). Only in the case of an α of 0.5, 1, or 2 can the functional form of the density be stated. For our purpose here, only the case α = 2 is of interest because, for this special case, the stable distribution represents the normal distribution. Then, the parameter β ceases to have any meaning since the normal distribution is not asymmetric. A feature of the stable distributions is that moments such as the mean, for example, exist only up to the power α. So, except for the normal case (where α = 2), there exists no nite variance. It becomes even more extreme when α is equal to 1 or less such that not even the mean exists any more. The non existence of the variance is a major drawback when applying stable distributions to nancial data. This is one reason that the use of this family of distribution in nance is still disputed. This class of distributions owes its name to the stability property that we described earlier for the normal distribution (Property 2): The weighted sum of an arbitrary number of independent α-stable random variables with the same parameters is, again, α-stable distributed. More formally, let X 1 , . . . , FIGURE B.9 Inuence of β on the Resulting Stable Distribution −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 α = 1.5, σ = 1, μ = 0 β = 0 β = 0.25 β = 0.5 β = 0.75 β = 1 x f (x) Continuous Probability Distributions Commonly Used in Financial Econometrics 357 X n be identically distributed and independent of each other. Then, assume that for large n ∈ N, there exists a positive constant a n and a real constant b n such that the normalized sum Y(n) =+++ +αβσµYn aX XXbS() (),, , nnn12 … (B.9) converges in distribution to a random variable X, then this random variable X must be stable with some parameters α, β, σ, and μ. The convergence in dis- tribution means that the distribution function of Y(n) in equation (B.9) con- verges to the distribution function on the right-hand side of equation (B.9). In the context of nancial returns, this means that α-stable monthly returns can be treated as the sum of weekly independent returns and, again, α-stable weekly returns themselves can be understood as the sum of daily independent returns. According to equation (B.9), they are equally distributed up to rescaling by the parameters a n and b n . From the presentation of the normal distribution, we know that it serves as a limit distribution of a sum of identically distributed random variables that are independent and have nite variance. In particular, the sum converges in distribution to the standard normal distribution once the random variables have been summed and transformed appropriately. The prerequisite, however, was that the variance exists. Now, we can drop the requirement for nite variance and only ask for independent and identical distributions to arrive at the generalized central limit theorem expressed by equation (B.9). The data transformed in a similar fashion as on the left-hand side of equation (B.2) will have a distribution that follows a stable distri- bution law as the number n becomes very large. Thus, the class of α-stable distributions provides a greater set of limit distributions than the normal distribution containing the latter as a special case. Theoretically, this justi- es the use of α-stable distributions as the choice for modeling asset returns when we consider the returns to be the resulting sum of many independent shocks with identical distributions. 359 APPENDIX C Inferential Statistics I n Appendix A, we provided the basics of descriptive statistics. Our focus in this appendix is on inferential statistics, covering the three major topics of point estimators, condence intervals, and hypothesis testing. POINT ESTIMATORS Since it is generally infeasible or simply too involved to analyze an entire population in order to obtain full certainty as to the true environment, we need to rely on a small sample to retrieve information about the population parameters. To obtain insight about the true but unknown parameter value, we draw a sample from which we compute statistics or estimates for the parameter. In this section, we will learn about samples, statistics, and estimators. In particular, we present the linear estimator, explain quality criteria (such as the bias, mean squared error, and standard error) and the large-sample criteria. In the context of large-sample criteria, we present the idea behind consistency, for which we need the denition of convergence in probability and the law of large numbers. As another large-sample criterion, we intro- duce the unbiased efciency, explaining the best linear unbiased estimator or, alternatively, the minimum-variance linear unbiased estimator. Sample, Statistic, and Estimator The probability distributions typically used in nancial econometrics depend on one or more parameters. Here we will refer to simply the parameter θ, which will have one or several components, such as the parameters for the mean and variance. The set of parameters is given by Θ, which will be called the parameter space. 360 The Basics of financial economeTrics The general problem that we address is the process of gaining informa- tion on the true population parameter such as, for example, the mean of some portfolio returns. Since we do not actually know the true value of θ, we merely are aware of the fact that it has to be in Θ. For example, the normal distribution has the parameter θ = (μ, σ 2 ) where the rst component, the mean, denoted by µ, can technically be any real number between minus and plus innity. The second component, the variance, denoted by σ 2 , is any positive real number. Sample Let Y be some random variable with a probability distribution that is characterized by parameter θ. To obtain the information about this popu- lation parameter, we draw a sample from the population of Y. A sample is the total of n drawings X 1 , X 2 , . . . , X n from the entire population. Note that until the drawings from the population have been made, the X i are still random. The actually observed values (i.e., realizations) of the n drawings are denoted by x 1 , x 2 , . . . , x n . Whenever no ambiguity will arise, we denote the vectors (X 1 , X 2 , . . . , X n ) and (x 1 , x 2 , . . . , x n ) by the short hand notation X and x, respectively. To facilitate the reasoning behind this, let us consider the value of the Dow Jones Industrial Average (DJIA) as some random variable. To obtain a sample of the DJIA, we will “draw” two values. More specically, we plan to observe its closing value on two days in the future, say June 12, 2009, and January 8, 2010. Prior to these two dates, say on January 2, 2009, we are still uncertain as to value of the DJIA on June 12, 2009, and January 8, 2010. So, the value on each of these two future dates is random. Then, on June 12, 2009, we observe that the DJIA’