In the first two parts of this series, I respectively addressed some misperceptions about the Gaussian copula and described its common use in CDO pricing. Part III focuses more on the model components and the intuition driving them.
I am a staunch supporter of a "models are just the tool" viewpoint, an opinion more elaborately and memorably stated by George Box as, "All models are wrong, but some are useful." With that in mind, what you will find here is not a campaign against the Gaussian copula itself; merely its blind application to certain problems in finance. I find it as difficult to blame this model alone for the 2008 recession as I find it hard to blame the sinking of the Titanic on its hull design (new research actually suggests the rivets were more at fault) - while it certainly contributed to a general sense of invincibility and well-more-than-advisable risk taking, it is naive to think that in the absence of this notorious model, 2008 would have turned out just fine.
As I recently (and strangely, given my campaigns against it) stated about VaR, the Gaussian copula does exactly what it is supposed to do - the error lies in its interpretation and its application in the first place. I join Paul Wilmott in his crusade for less equations and more common sense among quantitative financiers: getting the number is good but explaining it is better.
Copulas are nothing more than descriptions of how two or more random variables relate to each other. To be more specific, copulas refer to the co-behaviors of uniform random variables only; but any distribution may be transformed to the uniform case via its CDF, and that is the appeal of copula models: they describe dependance without concern of the marginal distributions. The Gaussian copula, we may conclude, doesn't necessarily have anything to do with normal distributions as we typically think of them (i.e. in the "normal distributions are useless in finance" sense)! Rather, it describes the sort of dependance that arises when a bunch of normally-distributed variables are correlated with each other.
Gaussian dependance isn't easy to describe like a Gaussian distribution is. For the latter case, just think of a bell curve. The former is more difficult to identify, so here's a picture of a two uniform random variables with a Gaussian dependance structure (click to zoom):
A first observation is that the dependence is regular (meaning even) and smooth. It lacks any significant clustering. More importantly, it lacks a property called tail dependence. Tail dependance is the probability of observing extreme observations in all random variables at once. Strictly speaking, it measures the probability of observing joint tail events. As you move further out in the tail, that probability converges to 1 in the limit for structures exhibiting tail dependence. It is extremely surprising and counter-intuitive to learn that the Gaussian copula lacks tail dependence. In plain English, this means that tail events in the Gaussian copula are asymptotically independent of each other - and that is the chief problem with using Gaussian dependence in finance.
In finance, extreme events co-occur all the time, as recent memory bears witness. If risk management is the process of ascertaining, measuring, and avoiding those situations, then doesn't it seem a little odd to use a model which is explicitly unable to account for them? Tail dependence is a necessary condition for a dependence model in finance. The Student t copula exhibits it and is only marginally more difficult to implement than a Gaussian copula; but simplicity is king and there was obviously a decision made at some point that tail events didn't require consideration, anyway. It brings to mind my favorite VaR metaphor as an airbag that always works, except in a crash.
Another element of the Gaussian model which does not carry well to finance is the idea that linear correlation is a sufficient statistic for the dependence distribution. Consider these two plots, each of which shows two variables that, by construction, have a correlation of 0.7 to each other. First, a Gaussian dependence structure (this looks different than the above plot because the former was the copula itself, as indicated by the uniform marginals, whereas this is a full copula-derived multivariate distribution):
Next, a dependance structure exhibiting lower tail dependance (this is from a Clayton copula and is a stylized depiction of behaviors more characteristic of finance). You can plainly see the impact of the tail dependance, in contrast to the Gaussian plot above:
The two distributions are very obviously different, and yet if you merely measured their correlation you'd describe them in exactly the same way. Correlation alone is insufficient to describe more complex dependence structures such as those observed in finance. And yet, it is the only descriptive statistic of a multivariate Gaussian distribution.
Financial covariates tend to resemble the second plot - when a large negative event occurs in one, it more than likely will occur in the other. This, by the way, accounts for some of the skewness in financial distributions - it is possible to have two perfectly normal distributions whose combination is nonetheless skewed if the dependence structure exhibits tail effects like this.
Again, we have a call for clarity: it is imperative for the underlying dynamic of any model to resemble the behaviors of the system in question.
The Single Correlation Factor
In a CDO pricing framework based on the Gaussian copula, not only is correlation the sole determinant of the dependence structure, it is assumed to be the same for every name in the basket. This has caused much alarm. Certainly, using more factors would provide a more accurate model - allowing different industries to have different correlations, for example. Unfortunately, this comes at the cost of model accuracy.
It is very important, where possible, for a model to have no more than one unobserved input for every output. Think of a Black-Scholes option: future volatility can not be known, so we plug in whatever value gets the model to spit out the current market price of the option (a "the market is always right" approach). If there were two volatilities (say, a short term value and a long term value), we would be unable to create a consistent model, for there would likely be an infinite number of volatility pairs that would satisfy the market price. For every additional parameter, we need one more output metric to match. If we could match an option's price and also it's delta, just for arguments sake, then there is probably a unique combination of two volatilities for that output space.
This is why using multiple correlations is problematic not just from a fitting standpoint, but from a model integrity standpoint - if you take the thousands of necessary pairwise correlations and estimate just a handful of them incorrectly, the model could deliver completely spurious results.
(For a very concrete example of this, consider pricing a mezzanine or senior CDO tranche, which requires two correlation inputs. Without knowledge of the corresponding equity tranche price - and consequently the attachment point correlation - this becomes a very difficult puzzle indeed).
However, in my mind this is one of the more minor problems. That's not to say it isn't an issue, but I'd much rather have a single-parameter tail-dependent model than a multi factor Gaussian one. Why? Because it's more important to me that a model captures downside risk in some regard than that it captures the distribution's central dynamics more faithfully.
We've discussed why correlation is insufficient to describe the CDO dynamics, and also why a single-factor model may lack fidelity. But in some ways, the entire discussion is slightly off base. Correlation (as I've alluded before) is an implied measure - it is whatever plug gets the model to output the "right" price.
There is a raging debate about how similar correlation is to Black-Scholes volatility, but I think for the purposes of this exercise we can highlight their similarities (though I will not necessarily agree with that under more rigorous terms). both are plug values; both have intuitively "correct" ranges but can not be directly measured or observed; both are the single unobserved input in the most simple pricing models of their respective derivatives.
Because of this, a lot of our reasoning on the problems with correlation goes backwards, since we begin with the premise that correlation is arbitrary and/or unmeasurable, and therefore conclude that a correlation-based model must fail. However, in practice we actually start with a tranche price, and work out the implied correlation value from that price. So I don't really care if my correlation comes out to 60% or 70% because I'm not going to read too much into that figure - it's just a parameter that will keep my model ticking consistently with the market, all else equal.
"But wait," you say, "that's the dumbest thing I've ever heard!" What if the market price is arbitrarily high and implies a correlation greater than 1 (or just 1, since the input is bounded)? Then that's great, you get the price right in that instant, but the second you try to measure any sort of risk or even price it the next day, you'll fail because a correlation of 1 doesn't reflect reality at all. Moreover, take this to its logical conclusion: why not have a model whose sole input and output is just the price. In this scenario, you would see a tranche trading at 20, and set your "model" to 20 (the implied price). Tomorrow, your model still says 20 - so when the actual tranche trades for 19, you need to adjust your "model parameter" (i.e. price) down. Obviously, a ridiculous situation and it speaks to the critical need for any model to balance a reasonable representation (even if a simple one) of reality with an acceptable range of input parameters.
To reiterate, this is why I would prefer a simplistic one-factor tail dependent model to a multifactor Gaussian one.
Other Copula Models
All of this must raise the question, why are we stuck using the Gaussian copula?
And like so much else, the answer is: because its easy.
As mentioned, the Student t copula exhibits tail dependence and is only slightly harder to build than the Gaussian variety. So why not use it? The dark secret (unless you read part II) is that single factor Gaussian copula models are really just massive simplifications of copula-derived mathematics. The engine itself relies on arithmetic and an integral - nothing that would suggest a copula model on the surface. It is the mathematically friendly properties of the Gaussian distribution that make this possible (though frankly, it seems to me a t implementation shouldn't be much farther off). More obscure copulas, like those in the Archimedean family, don't necessarily follow "real world" behaviors in high dimensions, as it pertains to finance.
Moreover, like all problems of this ilk, CDOs suffer from a massive curse of dimensionality. In such situations, familiarity is key - in fact, it is sometimes the only hope of finding answers in the massive cosmos of sparse data.
Finally, Gaussian copulas have a nice property - they are easy to explain (keep in mind, lately such explanations aren't much at all). In particular, the error rates are easy to quantify - we can be 99.975% sure of an outcome. Knowing a concrete chance of failure, even if that probability is completely bogus, makes the model easy to accept. More complicated copula structures, by contrast, are harder to work with (read: make it harder for risk managers to promise certain error rates within certain error bounds).
Finally, more complicated does not necessarily mean better. Even after all I've written, a pinch of common sense applied to a single factor Gaussian model might do more wonders than a more advanced model in the hands of a naive user.
Here endeth the lesson.