Density and Cumulative Distribution Functions
In mathematical sense a histogram is simply a mapping mi that counts the number of observations (frequencies) that fall into various disjoint categories (known as bins or intervals). The histograms also are called frequency distributions. If we let n be the total number of observations and k the total number of bins, the histogram meets the following condition:
n _ X mi
A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. That is, cumulative histogram Mi of a histogram mi is defined as
Mi = X mj j=i
Conversationally, the probability density function (PDF) is the curve that adjusts the histogram, and the cumulative distribution function (CDF) the curve that adjusts the cumulative histogram and completely describes the probability distribution of a real random variable.
Usually, the statistical behaviour of the random variables such as kt, kb and kd is carried out using the cumulative distribution function, which represents the probability that the event x(t), at the time instant t, be less than a given value x:
F(x, t) = P(x(t) < x). (3.8)
For stochastic variables, this quantity also represents the fraction of time that the stochastic variable is below a given value (fractional time). This second interpretation is more appropriate in certain cases.
The minimum number of intervals to be chosen in order to correctly draw the frequency histogram depends on the number of available data. As we will see later, different authors have used different number of intervals within the range of variation of these indices (from 0 to 1). We will use (xo|Ax|xf) to denote the first value of the interval (x0), its width (Ax) and the last value (xf). Therefore, for example, (010.0211) represents a distribution with 0 as the first value, 0.02 as the interval width and 1 as the last value. This implies a total number of 50 intervals.
The statistical behaviour can also be characterised by the probability density function f(x, t) defined as:
dF(x, t) dx
The functions are normalised in a way that the area under the f(x, t) curve is equal to unity. That is:
f(x, t)dx =1
In case of a finite range of variation, the integration limits in Eq. (3.10) only are extended to this range, since f(x, t) = 0 outside the range of variation. Particularly, in the study of kt, kd and kb, the normalised functions will verify that:
J f(x, t)dx = 1 f(x, t)dx = 1. (3.11)
Hereinafter, the parameter “t” will be omitted for the sake of clarity in the expressions.
The distributions of kt, kb and kd provide statistical information about the absolute frequency of these values. However, frequently it is more interesting to analyse the probability distribution of these indices under certain conditions. This is known as “conditional probability”. The density function is written as f(x|y), and is the distribution function of “x” when “y” fulfils a particular condition. It provides more accurate information on the index behaviour under the given conditions. Particularly, because of the interest of these distributions to estimate the performance solar conversion systems, the conditional probability distributions of kt, kb and kd are expressed in terms of the optical air mass, f(kt|ma), or in terms of the mean value in a determined period, for example f(kt|kH). We will refer to the cumulative conditional probability distributions as F(x|y).