A number of posts about:

Math

In the last few weeks, I’ve been asked more questions about risk and risk management than I recall hearing in the last year, and at no time has that been more clear than on a day that saw global indices fall 4%. For something we refer to so often, “risk” has proved an elusive concept. Still, it appears every day in the media, not to mention our own conversations. But what is “risk”, exactly?

What is “risk”?

We can’t even begin to discuss risk management without a clear understanding of the underlying concept itself. (To be clear, I’m going to talk about financial risk: that which is associated with a specific investment or portfolio. This includes risk due to market forces as opposed to operational or liquidity constraints.) Many possible definitions of “risk” may spring to mind:

  • The most you can lose on an investment
  • The most you can lose on an investment, with some confidence level alpha
  • The average return of the investment
  • The market value of an investment
  • The notional value of an investment
  • A one-standard deviation loss
  • A six-standard deviation loss
  • The chance that a company goes bankrupt
  • The chance that a counterparty goes bankrupt
  • The chance that you go bankrupt

These are all very useful ideas — we’ll talk about why in a second — but they dance around the issue. They are merely shadows or projections of financial risk. I list them here because ultimately “risk” must be defined in a way which is consistent with all of these projections; in fact it must actually encompass them all. In order to complete that definition, we’ll need to borrow some statistical thinking — but no math, don’t worry.

I propose that “risk” is a distribution of probable outcomes. Specifying “probable outcomes” is somewhat redundant because, in a statistical sense, a distribution is a catalogue of every possible outcome as well as its associated probability. Nonetheless I state it explicitly here because it’s important to realize that we must consider all outcomes, even those which are extremely unlikely.

Risk as a distribution

What does it mean to say risk is a distribution? Put another way, this suggests that if I truly know the risk of an investment, I know the probability of any given outcome. I think that’s a fairly broad characterization which satisfies both the requirement of encompassing the examples I listed earlier and an intuitive understanding of the concept. Volatility is frequently substituted for risk, as investors interpret volatility as uncertainty and risk, when viewed as a distribution, represents uncertainty in future outcomes.

We can now discuss the nature of distributions and their study. In some cases, it’s actually possible to know the true distribution. Flipping a fair coin is the canonical example, but we can also consider rolling a die or drawing a card. In fact, it should come as no surprise that the entire gambling industry is premised on the idea that the public will only be comfortable putting their money at risk if they feel fully informed about possible outcomes. With a coin, there are two outcomes, for argument’s sake let’s say 0 and 1, and each has a 50% probability of being realized. That’s it, we just fully characterized the risk in this investment with a simple Bernoulli distribution. How about the die? There are six outcomes — for simplicity let’s say {1, 2, 3, 4, 5, 6} — and each one has a 16.67% chance of realization. Thus, the risk of the investment is fully captured by a six-part uniform distribution.

Coins and dice are nice illustrations, but they are only toy examples. In the real world, the full list of outcomes may be difficult to ascertain and their respective probabilities even harder. This is where statistics enters the picture. At its core, statistics is the study of distributions. All I’ve received in years of studying is a bunch of tools for analyzing and describing these lists of potential outcomes. If an investment lacks an easily described set of outcomes, we search for clues as to what the underlying distribution could look like. This could include the type of security, its sensitivities to various external shocks, its historical movements, our expectations of the future, etc. From these indications, we can put together an arbitrarily complex picture of an investment’s underlying distribution.

Or at least, we think we can. Creating that picture is a little like trying to draw an object based only on its shadow. In statistics, we refer to this as a hidden or latent factor, or one which can not be observed directly. By sifting the data — the clues — in the right way, we can gain insight into what characteristics the distribution must have and, subsequently, it’s general form.

Choosing the distribution

Many distributions have properties called sufficient statistics. These quantities fully characterize the distribution, allowing it to be perfectly (or sometimes approximately) reconstructed without needing to carry around all the data which originally led to its discovery. Some of these summary statistics lurk in plain sight: mean and standard deviation are two of the most obvious. A dataset which follows a normal distribution, or standard bell curve, can be perfectly summed up with these two quantities. For example, if you made a list of the heights of everyone in your office, it would likely lie on a normal distribution (and for example’s sake, let’s say that is does). If you want to work with that distribution or build any sort of measurement of it, you need to keep a list of all (say) 200 people and their heights.  But if you know it’s a normal distribution, all you need is the mean (average) and standard deviation (dispersion around the mean). Those two numbers give you enough information to know the probability of observing any height in your original dataset, without the need to consult the data itself. They are sufficient statistics for the distribution.

For the coin toss, the sufficient statistic is the probability of 50%, which fully describes the underlying Bernoulli distribution. For the die, it is the range [1,6], which characterizes the discrete uniform distribution in question. When the list of potential outcomes deviates from well-known distributions, we have two options:

  1. Work with the unknown distribution
  2. Approximate the unknown distribution with a well-known one that has similar properties

While it seems like option 1 is the best choice, it can be a dangerous one. Recall that we may not actually know what the underlying distribution looks like; all we have is a picture based on its shadows. If we made mistakes creating that picture, we’ll have trouble making informed decisions later. Moreover, we will likely be stuck with a branch of statistics called “nonparametric analysis” which can be difficult to make good use of.

Option 2 is likely the better choice, provided that we can glean enough information about the underlying decision to make an informed choice for the approximating distribution. There is a tendency to always choose a normal distribution, but I think the anti-Gaussian media has beat that horse to death. Alternatively, there are many families of distributions available; we just want to pick one which describes the investment’s outcomes well while retaining a simplicity that makes any math tractable (and, hopefully, easy).

Option 2 also lets us come up with sufficient statistics for the investment. If all investments were normally distributed, then our portfolio analysis would boil down to their means and standard deviations (and correlations with each other, because the portfolio is a multivariate distribution). This assumption drove the mean-variance finance paradigm that was pioneered by Harry Markowitz in the 1950′s. Today we try to use more sophisticated distributional assumptions, but the idea remains the same: come up with a simple set of numbers that summarize your data and use them to analyze the whole.

Returning for a second to the height example, imagine I asked you to estimate the probability of a colleague being over 6’5″. If you retained the original dataset (option 1), you would start by counting tall people, divide them by the total count and give me your probability estimate. If you used an approximation (option 2), you’d pop the sufficient statistics into a well-known and exhaustively studied equation and know immediately not just the probability but also a measure of confidence in that number. More complicated analyses might be simply impossible without the distributional assumption. When we are unsure of the best approximation, some compromise of options 1 and 2 will result.

It’s very important to note that in describing the distributions or risk of these investments we made no judgments about quality. Surprisingly, we can’t even say whether they are “risky” or “safe”! Despite my claiming that “we know the risk of the investment,” all we’ve done is describe the outcomes; subjective and qualitative assessments are yet to come.

Risk as a metric

Once we have some idea of what an investment’s distribution of outcomes looks like, we have identified its “risk”. But as I’ve mentioned, we can’t yet do anything with that information. We need to create some sort of measurement that allows us to make comparisons and decisions. Risk metrics are those measurements.

Risk metrics are usually summary statistics of the underlying risk distribution. Summary statistics give information about the distribution, but, unlike sufficient statistics, they may not provide enough detail to recreate the distribution entirely. For example, the mean by itself or the standard deviation by itself or the minimum value all give some insight into the distribution but fail to characterize it completely. Frequently, estimates of these summary statistics are the “shadows” from which a picture of the true distribution is formed. When you measure the heights of everyone in your office, the observed mean and standard deviation constitute two of the clues you would use to construct the representative bell curve.

We have now learned enough to understand that the risks I listed earlier were actually summary statistics of an investment’s true distribution, or underlying risk. At the risk of redundancy, here they are again with explanations (note that some of these return to the distribution of returns, others to the distribution of portfolio values; it is easy enough to convert between the two):

  • The most you can lose on an investment (the minimum of the distribution)
  • The most you can lose on an investment, with some confidence level alpha (the 1 - alpha quantile of the distribution, also referred to as Value at Risk)
  • The average return of the investment (the mean of the distribution)
  • The market value of an investment (the most recent observation from the distribution)
  • The notional value of an investment (the minimum or maximum of the distribution)
  • A one-standard deviation loss (the standard deviation of the investment)
  • A six-standard deviation loss (the standard deviation of the investment)
  • The chance that a company goes bankrupt (a specific outcome from the distribution and its associated probability)
  • The chance that a counterparty goes bankrupt (a specific outcome from the distribution and its associated probability)
  • The chance that you go bankrupt (a specific outcome from the distribution and its associated probability)

It is clear that without knowledge of the underlying distribution, none of these quantities can be known. I want to hammer home the difference between knowing risk, the distribution, and risk metrics, summary statistics of that distribution. The distinction is even more important — and confusing — because sometimes the summary statistics are observed first and the distribution is inferred thereafter.

I mentioned earlier that volatility is frequently used to describe risk, because of its tie to uncertainty. We can now view it as just one more summary statistic (specifically, standard deviation). However, volatility has a special place in the risk paradigm because it was explicitly labeled as such in the mean-variance paradigm (it’s counterpart, return, is played by the mean). That legacy has held and is in many ways justified: more stable returns (less volatility) are associated with return distributions that are well-known and usually characterized by a lack of large losses. As volatility increases, the probability of losses generally increases as well. The distribution becomes more dispersed and various risk metrics take turns for the worse. Thus, volatility is a risk bellwether: easy to calculate and usually indicative of most other metrics.

(Another way to think of risk metrics is as low-dimensional projections of the underlying (and potentially high-dimensional) distribution.)

Choosing the metric

And now I’d like you to forget everything we just discussed. In practice, when we talk about “risk” we’re referring to risk metrics rather than the underlying distribution. The reason for that is pragmatic: what good does it do to tell someone what the distribution is? Returning to the heights example, knowing the distribution doesn’t give you any answers. In fact, if you’re a statistician it probably gives you a bunch of questions. Summary statistics (and more advanced results) provide answers. They take the large risk distribution and condense it into a useable form. The appeal is clear: I could tell you every possible outcome of the stock you’re about to buy, or I could tell you that you’re 90% likely to never lose more than 20%. Which is more useful (putting aside all arguments of whether the latter can truly be known)?

So when we talk about risk we’re talking about metrics. How do we choose those metrics? Well, if part 1 of the risk manager’s job is to model the underlying distribution, then part 2 is deciding which metrics are useful and calculating them. Needless to say, this part is more art than science. Contrary to popular belief, there is no magic number that contains all risk information and lets you make investment decisions without further analysis. You may have heard of these holy grails, they go by names like “value at risk”, “Sharpe ratio”, “Sortino ratio”, “return over maximum drawdown”, “omega ratio”, and so forth. These are like weight loss pills — they make promises grounded in just enough math to either convince or confuse (depending on the customer) and appear to work as advertised on the surface. Caveat emptor.

We have already learned why there is no “one number” solution: because risk metrics are summary statistics and not sufficient statistics. Now, even if they were sufficient statistics for the risk distribution, there still wouldn’t be a silver bullet, because the risk distribution does not allow qualitative judgments. It is merely a list of outcomes. If you could condense it to one number, you’d have a number that represented all your outcomes, good and bad, and not necessarily one which would provide an indication of value.

What’s really necessary is to look at many of these metrics together. Each one provides some information about the risk distribution, like various shadows from different light sources. By considering many of them at once, our understanding of risk (and equivalently, our picture of the underlying distribution) is enhanced.

There are a couple risk metrics which are always useful.

  • The most you can lose is an important one: investors need to bear in mind that zero is a real possibility. For most cash investments, this will be equal to the market value of the investment. Why isn’t this enough? If you bought a million shares of stock and sold a million puts on the same, the max loss on the stock would be greater than that of the options, and you might conclude that the stock was the riskier play. However, I don’t know anyone who would agree that buying stock is riskier than selling puts. We reach that conclusion by considering other outcomes of the respective distributions, or other summary statistics.
  • A reasonable upside estimate is also key. This may not fit the traditional intuition behind a “risk measure”, but it would help differentiate between the stock and option portfolios just described. The stock has large potential for gains; the puts are capped. Thus, the downside in the stock is mitigated by the positives but the put’s downside — though almost equal to the stock’s — is not similarly offset. The decision of what constitutes a “reasonable” upside is in the art category rather than science, so unfortunately I can’t provide a algorithm.
  • An understanding of an investment’s volatility. Volatility, as mentioned, is like a risk bellwether. As it increases, so does the uncertainty about the future outcomes. Another way to express this idea is to say that the entropy of the risk drops as the volatility increases (this idea hasn’t been explored nearly enough in the literature). Popular metrics like the Sharpe ratio try to capitalize on this idea by expressing the “return per unit of risk [volatility]“. Presumably, the more risk one takes through an investment, the greater the return that should be received. (This notion took a turn for a disaster when, in late 2008, angry investors wondered why they lost money in stocks as compared to bonds — the answer (that stocks are more risky) was staring them in the face, but they were accustomed to that risk resulting in greater yields and refused to accept any alternatives.)
  • Event-driven idiosyncrasies. Is your investment subject to legal/regulatory risk? Operational risk? Other highly-targeted risks unique to that security? If so, the risk distribution becomes much harder to estimate accurately because these characteristics distort it to the point that approximations fail to capture it fully. It is important to understand not only what these idiosyncrasies are, but how they can impact your estimates of risk. As a simple example, consider an illiquid stock which doesn’t trade except for a few times a year, when it jumps up or down 15%. Any distributional assumptions should be tossed out the window here; stick with more “nonparametric” qualifications like maximum loss and rely on an excellent understand of the risk specific to the investment.

No discussion of risk metrics would be complete without addressing value at risk. Value at risk, or VaR, was once a celebrated risk metric, introduced to the public by J.P. Morgan in 1994. More recently, it has become demonized and blamed for its contributions to excess risk-taking and the collapse of many financial institutions. VaR has a clear definition: it represents a level of returns which will only be exceeded some percent of the time, 5% or 1%. In a strict statistical sense, VaR defines the beginning of a distributions tail. Unfortunately, it provides no information about what happens when returns actually exceed VaR and make it into the tail. As more financial institutions came to see VaR as a minimum return, rather than an unlikely-but-still-possible return, they increased the level of risk they were willing to accept. On days when returns exceeded VaR — and they tended to do so by quite a bit — those institutions took losses far greater than they ever anticipated were even possible. In other words, they failed to consider that the risk distribution extended past the VaR level.

In a statistical sense beyond the scope of this writing, VaR does not satisfy certain axioms that good risk metrics require (see Artzner’s 1999 paper on coherent risk measures). Nonetheless, when used in compliance with its strict definition, it serves as just another summary statistic and can give limited insight to the risk distribution. It is useful to observe the evolution of VaR over time, for example (if VaR increases, risk is increasing, even if the absolute level of VaR is uninteresting). Extensions of VaR like expected shortfall (the average loss, conditional on that loss exceeding VaR in the first place) are also quite useful. An institution is not doing something “wrong” by calculating a VaR; it may be a red flag if they rely solely on the number, however.

The risk management process

What I’ve laid out here is a rather dry blueprint of the risk management process. The procedure is initiated by searching for clues to an investment’s underlying distribution. This could be any combination of quantitative (historical or modeled outcomes) and qualitative (fundamental analysis, opinions about the future) factors which provide the “shadows” of the distribution. From these, a complete picture of the distribution is constructed, either through the use of sufficient statistics or tailored models (if the distribution defies simple approximation). Finally, the distribution is used to generate risk metrics that allow investments to be assessed and compared. Those outputs become a critical input for the investment process, as decisions must be made in the context of the portfolio risk, and that risk must not be outsized relative to expected returns.

Once the investment is made, the risk manager will continue to exert influence on the portfolio distribution. For example, if the left tail becomes too big, he may take steps to reduce it by taking offsetting positions, or hedging. If exposure to a specific market force (such as interest rates, or currencies) becomes too large or too small, he may buy or sell securities to bring it back in line. This monitoring process is very important — the risk of an investment continues to change long after the investment is put on (in fact, you should hope it does, for otherwise nothing has happened at all!)

There are a few key lessons that can be taken from this process.

  • First, an appreciation for the lack of a silver bullet: there is no magic risk number that will protect your portfolio. I’m sorry.
  • Second, a grasp of the constantly changing nature of an investment’s risk. There is no “set it and forget it” in this process.
  • Third, an understanding of noise vs signal: investments will tend to sample from all over their distributions, both on the upside and down. It is important to observe whether or not the observed returns (themselves summary statistics, or “shadows”) match your understanding of the underlying distribution. If they deviate too much, be prepared to consider that your original assumption was wrong and start over.
  • Fourth, but most important, an understanding that the forest must not be lost for the trees. Seizing on one or two risk measures will inevitably lead to ignorance of the complete distribution (with possibly disastrous consequences). Conversely, trying to compute every summary statistic there is will lead to information overflow and indecision. Risk metrics are tools which provide insight; there’s a healthy balance between sparsity and indulgence. Thinking of the metrics as shadows from different lights really is a useful metaphor: too few and some details won’t be resolved; too many and the data’s redundancy will overwhelm any chance of learning from it.

Aside from these tips, I can’t stress enough the importance of practicing good risk management. Many investors do it implicitly, as simply understanding each investment is usually tantamount to intuiting its distribution. It doesn’t have to be a burdensome regime of additional steps, though many investors will find it useful to ask themselves, as an exercise, “What is the largest loss I can sustain and what is the likelihood of that event? What is the volatility of my portfolio, and am I earning enough to justify that allocation?” and so forth.

The risk management process is not unlike solving a puzzle by piecing together clues and constantly checking that the emerging picture matches up with expectations. I hope this explanation has been satisfactory and not too mathy (you don’t want to see me when I’m mathy). There’s a richness to the process which I’m afraid I won’t be able to describe here — for your sake and mine — but I think this should serve as a good jumping-off point for further discussion.

In conclusion, the Hitchhiker’s Guide to the Galaxy has this to say on the subject of tail risk:

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.

{ 7 comments }

The latest in a series of articles on the topic, Mike Loukides of O’Reilly Radar asks, “What is data science?“:

We’ve all heard it: according to Hal Varian, statistics is the next sexy job. Five years ago, in What is Web 2.0, Tim O’Reilly said that “data is the next Intel Inside.” But what does that statement mean? Why do we suddenly care about statistics and about data?

The article is excellent, insightful, and long. It’s not just an overview, it’s an in depth discussion of the who’s, how’s, what’s and why’s of data science – and required reading for anyone curious about what we data scientists actually do.

A few phrases that really stood out to me:

CDDB views music as data, not as audio, and creates new value in doing so.

One of the keys to data science is the realization that data is data is data; it doesn’t really matter what that data represents. A computer (read: algorithm, test, procedure) is content-agnostic. It just does what it’s told. It is up to the scientist — the human — to impose meaning and context on the results of the data manipulation. You might run two distinct analyses on the same dataset; or use the same analysis for two very different datasets. The procedure doesn’t care and — critically —  has no way of inferring its own success without a meta-algorithm layered on top of it. It’s easiest to let the data scientist be that top layer.

The question facing every company today, every startup, every non-profit, every project site that wants to attract a community, is how to use data effectively — not just their own data, but all the data that’s available and relevant. Using data effectively requires something different from traditional statistics, where actuaries in business suits perform arcane but fairly well-defined kinds of analysis. What differentiates data science from statistics is that data science is a holistic approach. We’re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.

This goes hand-in-hand with my last point: there’s no definition of the “right” analysis. Data science is a two-stage process: first, an exploration and second, an implementation (or communication). Repeat.

Once you’ve parsed the data, you can start thinking about the quality of your data. Data is frequently missing or incongruous. If data is missing, do you simply ignore the missing points? That isn’t always possible. If data is incongruous, do you decide that something is wrong with badly behaved data (after all, equipment fails), or that the incongruous data is telling its own story, which may be more interesting?

There’s a nice section, including the above paragraph, on the life-cycle of data itself. The one thing I would add is that data frequently needs to be transformed before it becomes usable. Too many applications today just take data in its raw form and try to correlate it (I’m looking at you, every-application-that-counts-words-in-tweets!). Standardization, whitening, dimension reduction and transformation are important and crucial steps in getting informed results.  If I gave you audio data, you wouldn’t just use it as it appears, you’d probably run it through an FFT first. I suppose you could argue that this step of the analysis is actually part of the analysis itself, and not part of the data preparation.

The problem with most data analysis algorithms is that they generate a set of numbers. To understand what the numbers mean, the stories they are really telling, you need to generate a graph.

Sometimes, sometimes not. The data-visualization/infographic movement in one of the best things that has happened to data science in a long time. Unfortunately, it has also trained us that “pictures are good; simple pictures are better.” There’s nothing more communicative than a good chart, true, but some datasets belie graphic communication. Multi-dimensional datasets are certainly hard to draw without some process like MDS or projection pursuit. I would argue that for many data applications, visualizations are part of the exploratory process but would/should not be considered a final product. For complex data, visualizations show you the question and how the data relates to it; they may not actually show you the answer.

According to DJ Patil, chief scientist at LinkedIn, the best data scientists tend to be “hard scientists,” particularly physicists, rather than computer science majors. Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data. They have to think about the big picture, the big problem. When you’ve just spent a lot of grant money generating data, you can’t just throw the data out if it isn’t as clean as you’d like. You have to make it tell its story. You need some creativity for when the story the data is telling isn’t what you think it’s telling.

This is a really interesting point — being able to code does not a data scientist make (though it certainly doesn’t preclude the possibility). Data science is about creative thinking as much as it is about creative implementation.

Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdiscplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: “here’s a lot of data, what can you make from it?”

I’ve actually used exactly the same question to describe the field. It is the central, driving objective behind data science and its simplicity speaks to the incredible diversity of projects and pursuits that the field allows.

{ 1 comment }

The NYT is running a great article about the influx of data in today’s world. The prime argument borrows from Einstein’s quote, “Not everything that can be counted counts, and not everything that counts can be counted.”

I think this speaks volumes and should be heeded by the sites that persist in churning out infographics that do little to educate (or illustrate) about anything, except maybe how easy it is to draw monochromatic pie charts. A notable (and humorous) exception may be seen here.

One of the article’s most salient points is that it is not enough to take raw data, run it through a battery of statistical tests, and publish the results. And yes, pie charts are a statistical test. The data must be understood and interpreted – and statisticians will use a first set of tests to illuminate the nature of the data, even before we begin testing hypotheses. After all, how can you answer a question without truly understanding what it is? Remember that any statistical test involves a null hypothesis and an alternative – without understanding exactly what the data represents, it is impossible to properly express those options.

But the statistician’s work is not done once the data is understood and the tests are performed – the results of those tests must be interpreted as well. “Lies, damn lies and statistics” isn’t just an anecdote – it’s truth! Show me a result from a dataset, and I’ll show you a convincing way to present an alternative conclusion. It is only by ensuring the integrity of the data and the tests, by knowing exactly what questions are being asked and the manner in which they will be answered, that we can have confidence in our results.

I think it’s wonderful that the tools of statistics have become democratized. But we need to make sure that statistical thinking is as widely disseminated as that math. Tools aren’t much use without the knowledge to wield them. I can hold a hammer and screwdriver, sure, but I’m no master carpenter. Until we can be confident that our statistics come from statisticians, it will remain necessary to question all analyses. As I write that, I’m well aware in that scenario we’d probably need a healthy dose of skepticism, just the same. Who better to disguise meaning than the master statisticians themselves?

Beware statisticians bearing gifts.

{ 0 comments }

Alice in Numberland

March 9, 2010 in Math

Fascinating… far from being a psychedelic tour of the imagination, one graduate student argues that Alice in Wonderland is actually a satire of Victorian mathematics:

Yet Dodgson [Lewis Carroll] most likely had real models for the strange happenings in Wonderland, too. He was a tutor in mathematics at Christ Church, Oxford, and Alice’s search for a beautiful garden can be neatly interpreted as a mishmash of satire directed at the advances taking place in Dodgson’s field.

In the mid-19th century, mathematics was rapidly blossoming into what it is today: a finely honed language for describing the conceptual relations between things. Dodgson found the radical new math illogical and lacking in intellectual rigor. In “Alice,” he attacked some of the new ideas as nonsense — using a technique familiar from Euclid’s proofs, reductio ad absurdum, where the validity of an idea is tested by taking its premises to their logical extreme.

{ 0 comments }

The Sortino ratio has emerged as a popular risk measure when evaluating investments. It is a modifcation of the Sharpe ratio, a workhorse indicator of mean/variance economics.

The Sharpe ratio is constructed like this:

S = \frac{E(r)-r_b}{\sigma}

where E(r) is the expected return, r_b is a benchmark hurdle, and \sigma is the standard deviation of the returns. If you buy into a Gaussian mean/variance paradigm, then the Sharpe ratio tells you how many units of excess return you receive per unit of risk you take.

The Sortino ratio is constructed similarly:

S = \frac{E(r)-r_b}{\sigma_D}

Here, \sigma_D is the downside deviation, or the standard deviation of returns below the benchmark. The intuition of using this statistic is that people do not penalize investments for positive volatility (i.e. unpredictable but beneficial returns); they only care about negative volatility.

And here lies the rub: it’s very easy to calculate a misleading Sortino ratio. The popular method – you’ll see it floating around the web – is to take any positive (or above-benchmark) return, change it to a zero, and calculate a standard deviation as one normally would, across all returns.

To me, that’s not right. You are artificially introducing a steady stream of zeros into your calculation, depressing the volatility calculation. A more proper way is to throw out any positive returns, and calculate the standard deviation of the negative returns (it should not be surprising that this method complies with the intuition for using the Sortino in the first place).

So the next time you’re presented with a Sortino ratio, take care to understand whether it includes zeros or not – if it does, the denominator is necessarily biased toward zero, and the ratio is overstated.

{ 6 comments }

The mathematician’s lens

January 25, 2010 in Data,Math

A beautiful article in the NYTimes contrasts abstract mathematics with the chilling reality of the Mexican drug cartel wars:

I was born in Mexico City, in a world that seems less and less familiar to me. I live now in the opposite corner of the continent. I am training to be a political scientist at Harvard. My passion has remained the afflictions of my homeland, but at Harvard I have found new ways to address them, to use mathematical models — matrices, vectors, equations, regressions — to understand the Mexican drug crisis.

The cartel wars are extremely violent, and the gangs are responsible for reprehensible kidnappings and deaths. They rank among the most deadly periods of organized crime in human history. The author’s goal isn’t to explain how she can analyze the wars from up in an ivory tower; it’s to describe how her mindset and toolkit inform her understanding of the world in any situation.

The article captured me because it never mentions what the author actually models. Instead, it presents her frightened thoughts and her efforts to calm herself by looking at the world through a mathematical lens. But it’s not what you think; there are no emotionally-distant mathematicians here. The author communicates her fascination with tying reality to abstract models, expecting and preempting the protest that reality is too complex and math too simple:

In this violent world, with the man in the blue Chevy whispering at me behind the window, math is my shield. Speaking up about drugs is in these parts a dangerous game. But not if you speak in the language of sigma and conditional expectations. Math protects me from the immediacy of the violence, and it protects me from them.

The beauty of my method lies in its simplicity. With mathematics I’m able to codify and simplify reality to make it manageable and, more important, malleable. I represent each possible individual as an equation in which each term symbolizes tastes, goals, profession and abilities. All people get portrayed: Policemen, politicians, citizens and drug cartels start living in this mathematical world as planes and hyperplanes and, as in real life, they interact and affect one another, sometimes colluding, sometimes colliding, sometimes neither.

I then use optimization to predict the form of interaction that will be the most probable to emerge and remain over time. Math starts speaking. It tells me, for example, under what conditions the outcome would be a drug war; when would the government prefer to cooperate with cartels; or when cruel intra-cartel purges will become the norm.

There is a part of every modeler’s mind which is constantly teasing out variables from constants. The statisticians among us may take a frequentist view, and wonder what would happen if a scene played itself out a million times; the programmers will deduce the underlying algorithms from the fuzzy result; the pure mathematicians will see manifolds everywhere:

In this abstract microcosmos, reality can be frozen or just slightly changed. I move and look at my hyperplanes from different angles. Let’s change the penalty code. No, let’s increase patrolling. Or reduce wages. Allow less contact between policemen and dealers. Assume the police force is corrupt. Assume it is not. I solve the equations and there it is. My answers come as Greek letters and probabilities.

But we all admit:

I know, I know, this is weird.

Ultimately, “free will” becomes the clarion of the independent. At least, it’s the best response to this explanation:

It may seem strange to examine this shadowy world with equations. But mathematics is transforming the social sciences. In the same way that physicists can predict the movement of atoms in space, we can use mathematics to model how individuals and groups will make decisions and interact in a society.

But free will has a (somewhat tentative) analogue in Heisenberg’s uncertainty principle, and with that philosophy and math (or theology and physics) are combined — but there’s been plenty of pop-sci written on that topic.

I found this brief article remarkable in how it was able to demonstrate the overlay mathematical thought on an extremely “human” subject without ever needing to explain either one.

(Via Drew Conway)

{ 0 comments }

011110

January 11, 2010 in Math

Today’s date represents a binary string.

So did yesterday’s. So will November 1′s. This is not news.

But today’s is a palindrome. This is slightly more newsworthy.

Geeks, rejoice.

{ 0 comments }

More mainstream Bayesians

December 20, 2009 in Math

The NYT recently ran an article on the math behind the recent and controversial mammogram advisory change. Unsurprisingly, it is heavily centered on a Bayesian argument. Of course, the key point here is not that the statistics dictated the change, but that budgets and political agendas dictated an acceptable level, which the statistics subsequently informed:

Let’s suppose 100,000 screenings for this cancer are conducted. Of these, how many are positive? On average, 500 of these 100,000 people (0.5 percent of 100,000) will have cancer, and so, since 95 percent of these 500 people will test positive, we will have, on average, 475 positive tests (.95 x 500). Of the 99,500 people without cancer, 1 percent will test positive for a total of 995 false-positive tests (.01 x 99,500 = 995). Thus of the total of 1,470 positive tests (995 + 475 = 1,470), most of them (995) will be false positives, and so the probability of having this cancer given that you tested positive for it is only 475/1,470, or about 32 percent! This is to be contrasted with the probability that you will test positive given that you have the cancer, which by assumption is 95 percent.

{ 0 comments }

Professor Risk

December 13, 2009 in Math,Risk

David Spiegelhalter is the Professor of the Public Understanding of Risk at Cambridge University. He has recently produced the following video to encourage better practices in the casual perception of risky behaviors:

YouTube Preview Image

I think it’s a brilliant video and would love to have been one of Professor Spegelhalter’s students. I firmly believe that the study of risk and statistics more generally suffers more than anything from a particularly awful and dare I say boring curriculum, not to mention one which many teachers choose to render in terms beyond the grasp of many students. Efforts like this go a long way toward alleviating that obstacle and I applaud the professor for his work.

{ 0 comments }

Parallel processing

December 11, 2009 in Math

Via Spontaneous Symmetry, a fascinating story about parallel processing and the power of blogging:

Normally, when [a mathematician] seeks a proof, he locks himself in a room with a chalkboard for long periods of time. He may consult his peers at his university, he may read books, he may look through papers, but the majority of thinking takes place within one brain. It’s serial. Gowers had a better idea. Instead of retreating to a dark room, he posted a section on his blog asking for help with the proof. Anyone from around the world could contribute to the idea by posting a comment. He hoped, in this fashion, to link together the brains of people from all around the world. Gowers eventually received hundreds of comments and, over the course of a few weeks, using the ideas in these comments, he was able to piece together a simple proof.

Though SS aptly notes:

I’m afraid to ask how many inane “comments” the poor mathematician had to wade through between each substantive remark.

{ 2 comments }

Great expectations

November 27, 2009

I’ve previously covered the danger of attributing meaning to a forecast which is obviously based on little or no information. In that case, it was the manufacturing survey, which one might dismiss as a more obscure measure. Recently, however, Ken Houghton has written a pair of posts on inflation forecasts that bring me back to [...]

0 comments Read the full post →

He clearly didn’t give it 110%

November 17, 2009

Silicon Alley Insider is running a series of posts called “15 _______ questions that will make you feel stupid.” The blank has been filled twice with “Google interview” and most recently with “management consultant interview.”  I particularly enjoyed one of the Google questions: If the probability of observing a car in 30 minutes on a [...]

0 comments Read the full post →

Choropleths in R (yes, “choropleths”)

November 12, 2009

Using R to recreate color-indexed maps of US unemployment data.

7 comments Read the full post →

Math is hard!

November 11, 2009

Via Spontaneous Symmetry, it appears that some people are a bit rusty on their math. The town of Truro, MA recently voted on a proposed zoning measure which required a two-thirds approval to pass. Out of 206 people, 136 voted in favor – just shy of the required two-thirds. Or was it? The exact count [...]

2 comments Read the full post →

Ten statisticians every psychologist should know

November 11, 2009

Psychologist Daniel Wright has published a list of ten statisticians every psychologist should know. The list is comprised of The Founding Fathers: 1. Karl Pearson – who established statistics as an academic discipline 2. Ronald Fisher – who developed much of statistics’ mathematical foundation, including ANOVA and maximum likelihood, and the importance of p-values 3. [...]

0 comments Read the full post →

Living in a Bayesian world

October 30, 2009

Increasingly, I’ve noted in my discussions with statisticians and practitioners a reliance on Bayesian methods. Bayesian statistics rely on an understanding of the uncertainty of a hypothesis. For example, Bayesian hypotheses are literally updated as new information becomes available. Bayesian analyses will also rely heavily on conditional probabilities, or the understanding of likelihoods that depend [...]

0 comments Read the full post →

How Shazam works

October 28, 2009

Ever wondered how song-identifying iPhone app Shazam works? Now you know. (For the link-averse: it’s a pretty cool implementation of pattern matching across song spectograms, and the key insight was to first reduce the spectograms by including only peak frequencies. Simple, yet genius.) (via Revolutions)

1 comment Read the full post →

The Yom Kippur effect

October 1, 2009

You may have noticed a minor obsession with traffic amidst TGR’s usual fare, which is why I was especially interested in a recent Freakonomics piece called “A Gut Yontif for L.A. Drivers” (Gut Yontif is a traditional Yiddish greeting used on Yom Kippur – it literally means “good holiday”). The post’s motivation is purely anecdotal [...]

0 comments Read the full post →

An interview with Mandelbrot

October 1, 2009

The FT has posted a lengthy video interview with the brilliant mathematician Benoit Mandelbrot, whose book The (Mis)behavior of Markets first inspired me to enter finance and risk management in particular. I do find  that some of John Auther’s questions mar an otherwise interesting (but extremely high-level) overview of Mandelbrot’s thoughts on finance. Right from [...]

0 comments Read the full post →

Suspicious poll distributions

September 25, 2009

I’ve covered Benford’s method for first-digit fraud analysis before, and now Nate Silver has applied a similar method to polling results. He looked at the last digit of various polls (i.e. a 48% McCain, 49% Obama, 3% undecided poll would be recorded as an 8 and a 9) and compiled histograms of their frequencies. Following [...]

0 comments Read the full post →

Lottery math is not so easy

September 23, 2009

Carl Bialik has written about lottery coincidences in his WSJ print column and on The Numbers Guy blog, inspired of course by the recent consecutive draws in the Bulgarian lottery. Addressing my recent confusion, he sheds a little light on why likelihood estimates varied so much: The probability of Bulgaria’s repeated winning numbers became a [...]

0 comments Read the full post →

Adventures in probability

September 17, 2009

Calculating the probability of the Bulgarian lottery drawing the exact same numbers in consecutive weeks.

5 comments Read the full post →

Enlightenment

September 11, 2009

A relatively new program has been devised, with the blessing of the Dalai Lama, to instruct Tibetan monks and nuns in science and math. These students have little or no formal education in that area, but are adept learners and take to the material quickly and with interest. My favorite quote from the NYT’s report: [...]

0 comments Read the full post →

Junk Maths

September 10, 2009

Via Andrew Gelman, I’ve learned that the BBC has a radio programme (as they would say write) called More or Less which is dedicated to statistics. The first bit of the most recent one is called “Junk Maths” (and again, I wish I could have taken a class called “maths”) with the following synopsis: Spurious [...]

0 comments Read the full post →

Search forecasts

September 8, 2009

Google Insights recently rolled out a new feature: 12 month search forecasts. The forecast comes from a relatively simple decomposition of the search volume into trend, seasonal and residual components. The model’s out-of-sample performance is tested on the most recent 12 month period; if that prediction proves accurate, then the model is accepted. Here’s what [...]

0 comments Read the full post →

Tanning in perspective

September 2, 2009

Information is Beautiful tipped me off to this poster from GOOD: Skeptic that I am, I immediately questioned the headline as propaganda. The Sun very well may produce that much energy, but how much of it reaches the Earth – in other words, how much of it can actually be harnessed? This makes for a [...]

1 comment Read the full post →

Modelling interactions

August 18, 2009

Andrew Gelman’s latest post highlights the importance of interactions. He includes this breakdown of where people fall depending on political party, ideology, and income: Consider the income dimension. Among liberals, the income curve is flat no matter whether the person is a Democrat, Independent or Republican. For conservatives, however, income has a large effect – [...]

0 comments Read the full post →

R style guide

August 13, 2009

Google has posted a style guide for R which is being used throughout the organization. It’s mostly in line with what I learned once upon a time, but it’s nice to see such an authoritative body coming out with a set of standards. Universal coding benefits everyone, and R is growing so rapidly that some [...]

0 comments Read the full post →

Deconstructing the Gaussian copula, part III

August 11, 2009

The intuition behind copula models: dependence, correlation, single factors and more.

0 comments Read the full post →

Deconstructing the Gaussian copula, part II and a half

August 11, 2009

An aside on static recovery assumptions in CDO pricing.

2 comments Read the full post →