Mostly Harmless Econometrics: An Empiricist’s Companion
The Quantile Regression Approximation Property*
The CQF of log wages given schooling is unlikely to be exactly linear, so the assumptions of the original quantile regression model fail to hold in this example. Luckily, quantile regression can also be understood as giving a MMSE linear approximation to the CQF, though in this case the MMSE problem is a little more complicated and harder to derive than for the regression-CEF theorem. For any quantile index т 2 (0, 1), define the quantile regression specification error as:
Дт(Xj, PT) — N'iPT - QT(Yi|Xi).
The population quantile regression vector can be shown to minimize an expected weighted average of the squared specification error, Д^ (Xj, P), as shown in the following theorem from Angrist, Chernozhukov, and Fernandez-Val (2006):
Theorem 7.1.1 (Quantile Regression Approximation) Suppose that (i) the conditional density fy (y|Xj) exists almost surely, (ii) E[Yj], E[QT(Yj|Xj)], and E||Xj|| are finite, and (iii) Pt uniquely solves (7.1.2). Then
Pt = arg min E [wt(Xj, b) ■ Д^(Xj, b)] , (7.1.7)
bCRd
where
and єі(т) is a quantile-specific residual,
tj (т) — Yi QT (Yi|Xi);
with conditional density fe(T) (e|Xj) at €j(r) = e. Moreover, when Yj has a smooth conditional density, we have for P in the neighborhood of Pt :
The quantile regression approximation theorem looks complicated but the big picture is simple. We can think of quantile regression as approximating QT(Y;|X;), just as OLS approximates E[Y;|X;]. The OLS weighting function is the histogram of X;,which we denote ^(X;). The quantile regression weighting function, implicitly given by wT(Xj, ftT) • ^(X;), is more elaborate than n(Xft) alone (the histogram is implicitly part of the quantile regression weighting function because the expectation in (7.1.7) is over the distribution of X;). The term wT(X;, ftT) involves the quantile regression vector, ftT, but can be rewritten with ftT partialled out so that it is a function of X; only (see Angrist, Chernozhukov, and Fernandez-Val, 2006, for details). In any case, the quantile regression weights are approximately proportional to the density of Yj in the neighborhood of the CQF.
The quantile regression approximation property is illustrated in Figure 7.1.1, which plots the conditional quantile function of log wages given highest grade completed using 1980 Census data. Here we take advantage of the discreteness of schooling and large census samples to estimate the CQF non-parametrically by computing the quantile of wages for each schooling level. Panels A-C plot a nonparametric estimate of QT(y; |X;) along with the linear quantile regression fit for the 0.10, 0.50, and 0.90 quantiles, where X; includes only the schooling variable and a constant. The nonparametric cell-by-cell estimate of the CQF is plotted with circles in the figure, while the quantile regression line is solid. The figure shows how linear quantile regression approximates the CQF.
It’s also interesting to compare quantile regression to a histogram-weighted fit to the CQF, similar to that provided by OLS for the CEF. The histogram-weighted approach to quantile regression was proposed by Chamberlain (1994). The Chamberlain minimum distance (MD) estimator is the sample analog of the vector ftT obtained by solving
~t = arg min E [(Qt(y;|X;) - X;&)2] =argmin E [Д^(X;, b)] .
bcRd b2 Rd
In other words, ftT is the slope of the linear regression of QT(Y;|X;) on X;, weighted by the histogram of X;. In contrast with quantile regression, which requires only one pass through the data, MD relies on the ability to estimate QT(Y;|X;) consistently in a nonparametric first step.
Figure 1 plots MD fitted values with a dashed line. The quantile regression and MD lines are close, but they are not identical because of the weighting by wT(X;,ftT) in the quantile regression fit. This weighting accentuates the quality of the fit at values of X; where Y; is more densely distributed near the CQF. Panels D-F in Figure 7.1.1 plot the overall quantile weights, wT(X;,ftT) • ^(X;) against X;. The panels also show estimates of the wT(X;,ftT), labeled "importance weights," and their density approximations, 1/2 • fy (QT(y;|X;) |X;). The importance weights and the density weights are similar and fairly flat. The overall weighting function looks a lot like the schooling histogram, and therefore places the highest weight on 12 and 16 years of schooling.
the text.