Algorithms behind the Correlation Setting Window

 

1         Introduction

In this report, detailed information about the “correlation setting” pop up window is given. See Figure 1. This window is obtained by clicking on the radio button labelled “Known dep” in the main screen of Statool. We list the approaches which are used to calculate the theoretical correlation and expectation of XY. Also we demonstrate how to get constraints for linear programming from the different setting values.

2         Obtaining Expectation of XY Based on the Join Distribution: Et

This section illustrates how to figure out the possible range of the expectation of XY if the marginal distributions of X and Y are known.

 

Assume that the marginal distributions of X and Y are known, as listed in the following table.

Y ↓  X

 

 

1

 

According to the definition ofwe have . Here Xi and Yj are interval values. Based on interval multiplication,

 

Let  and

Then

There also are the constraints on the’s from the marginal distributions. These are the rows and columns’ constraints, as follows;

 

 for j=1 to n

 for i=1 to m

 

Here, only the , i=1 to m, j=1 to n are unknown. Our objective is to find the minimum and maximum values possible for EXY. Since each  is non-negative, the minimum value of EXY is obtained by minimizing and the maximum value of EXY is obtained by maximizing. Therefore two linear programs are constructed to get the minimum and maximum values of EXY.

 

Minimum value:

 

Minimize

Subject to:

 for j=1 to n

 for i=1 to m

 

Maximum value:

 

Maximize

Subject to:

 for j=1 to n

 for i=1 to m

 

After solving these two linear programmings, the minimum and maximum values of EXY are obtained and are recorded as Emin and Emax. These values are presented in the “Expectation of XY sub-window” of the “Correlation Setting” pop up window.

3         Theoretical Correlation

Although the marginal distributions don’t determine the exact correlation between two random variables, they often constrain it to some extent. In the following, we will show how to compute the possible correlation range from the marginal distributions.

 

From the definition of correlation, where Var(X) and Var(Y) are the variances of X and Y. Rearranging,. From the previous section, the theoretical range of EXY from the definition of EXY is from Emin to Emax. Here we have another formula of EXY from the definition of correlation. We consider computing the possible range of EXY from this new definition. EXY can be written as

We define the function This is an interval-valued function. We write the corresponding real function as

 where i=1 to n,  j=1 to m, and . In this function, there are n+m+1 variables and every variable is restricted to the specified interval range. We can use an optimization method to find the minimum and maximum value for F(x,y) and record them as Fmin and Fmax.  (This is a nonlinear optimization problem).

 

Now we get two ranges for EXY from the different formulas. Since both are true, we exploit both by intersecting them. Call the low and high bounds the intersection Gmin and Gmax. Then   and

.

 

The values of  and  used to compute Fmin and Fmax are used again here to compute the bounds on.

 

Since we just want to get a safe range for correlation, not necessarily the narrowest possible range, we are done.

 

A more accurate range for correlation can be gotten directly from computing the min and max of. This is a complex nonlinear optimal problem. This range is presented in the “Correlation Coefficient Subwindow” of the “Correlation Setting” popup window.

4         Mean and Variance

Theoretical ranges of mean and variance of operand are calculated by the program. These values are directly obtained according to the definitions.

 

From the definition, expectation of random variable X is. Since Xi is an interval value, . So the bounds on EX are obtained. The similar method is used to handle operand Y. the bounds on EY are  and.

 

Variances of X and Y are a little more complex to obtain. Based on the definition, variance of X is . Here each Xi is an interval value. This is a problem of evaluation of an interval function. We define a real function  and each  i=1 to n. Since all Pxi are known, the optimization method can be adapted to compute the min and max values of function V(x) as VXmin and VXmax. The similar method is used to variance of Y. Let  and  j=1 to m. Then the bounds of variance of Y are obtained, recorded as VYmin and VYmax. These ranges are presented in the “Mean and Variance Subwindow” of the “Correlation Setting” popup window.

5         Constraints from setting the range of correlation

In this section, we demonstrate how to get extra constraints if the user sets the range of correlation in the “Correlation Coefficient Subwindow” of the “Correlation Setting” popup window.

 

From section 2, , since Pij is non-negative.

From section 3,

Using the real function  and  i=1 to n,  j=1 to m, and given range for correlation, the minimum and maximum values of F(x,y) can be calculated by non-linear optimization as in section 3. Call them Fmin and Fmax.

 

Based on Berleant & Zhang [1], two inequalities are defined:

and. These two inequalities form two extra constraints for linear programming since only the Pij’s are unknown.

6         Constraints from setting the range of EXY

If the user sets “EXY range” in the “Expection of EXY” subwindow of the “Correlation Setting” popup windows, the values that the user provies, Fmin and Fmax, are used directly to define 2 constraints:

.

 

These constraints were justified in the section 5 and in Berleant & Zhang [1].

7         Constraints from setting Mean and Variance of X and Y

The user can set mean and /or variance in the “Mean and Variance Subwindow” of the “Correlation Setting” popup window. Consider the formula . If the means and variances of X and Y are known, the value of EXY can be calculated if correlation is also known. From section 5, the range for correlation is computable. We can use this range of correlation to calculate the range of EXY. It is clear that computing EXY is interval, not a real number. Let the low bound of EXY be called Fmin and the high bound be called Fmax. Then, , and . These constraints are then used by Statool.

8         Constraints from Setting Correlation, Mean and Variance of X and Y

In the some situations, the user may know partial information about both correlation, and either mean, variance or both. Here is how the user can choose values for correlation, and mean and/or variance.

 

First, the user should click on the checkbox button labelled “Input data in both the correlation subwindow and the mean and variance subwindow” in the “Correlation, Mean, and Variance Subwindow” of the “Correlation Setting” popup window. Then the user can set values in both the “Correlation” and “Mean, Variance” subwindows of the “Correlation Setting” popup window.

 

In section 7, we describe the situation where mean and/or variance are known. If correlation is also input, we can directly use all three in the formula  to get the value of EXY. If either mean or variance is missing, a default range for it may be obtained as described in section 4. Let the low bound of EXY be called Fmin and the high bound be called Fmax. Then, , and . These extra constraints are then added to the LP calls.

9         References

[1] D. Berleant and J. Zhang, “Using correlation to improve envelopes around derived distributions,” Reliable Computing, in press as of 3/27/03

 

Figure 1. "Correlation Setting" p1opup window.