Google has introduced software that allows non-programmers to create relatively simple Android applications. The program wraps pre-written pieces of code in bite sized visual representations that can be linked together to create complex behaviors. The software can tap many areas of the Android API, including hardware functions like the accelerometer, and can autonomously respond to stimuli like incoming calls or texts.

The process appears very similar in concept to Apple’s Automator, a visual scripting program for Mac OS X. Automator lets users string together a series of actions, essentially creating “to-do lists” for the computer. However, its action library is relatively limited and is best suited to batch operations which a human could do, but wouldn’t want to because of time or tedium. Google’s application, by contrast, allows scripting to enter the realm of actions humans can’t perform – like auto-replying to texts or responding automatically to changing conditions in physical or digital space.

For all its benefits, though, I don’t see this program gaining widespread attention immediately. There’s still a relatively small overlap between “people who want to build apps” and “people who don’t know how to build apps”. Outside that intersection, Google’s application has little value.

On the other hand, the former category (people who want to build apps) is probably growing every day, and could reach critical mass where the motivation to have X functionality will be answered with Google’s easy software solution. From the other direction, as software like this continues to become more advanced and incorporates more complicated scenarios, developers who do know how to build an app may migrate toward the prepack solution. If it produces the same result in less time, why wouldn’t they? A majority of smartphone applications are nothing more than distilled tables of a larger database; if Google has an easy way of creating that product then more power to them.

This marks another big step forward in programming literacy. As more of programming’s nuances can be wrapped up and handled behind the scenes, it becomes accessible to more people in an imperative form. A person can literally tell his computer/smartphone to do X, Y, and Z, and it will — provided that the actions conform to the library of easy commands that Google has exposed. Outside those wrappers, developers are on their own — but just as the visual GUI replaced the command line, simplified programming will become a critical skill as we require our computers to do more than just store files and load the internet.

I remember the first time I “asked” a computer to do something and the pleasant surprise when it returned an answer (no, it wasn’t “hello world”). Hopefully applications like Google’s will allow that feeling to become democratized and let a larger number of people create applications and extend their computers as active tools. There’s no reason a processor has to be stuck loading the internet and connecting calls; it can do just about anything we can imagine, as long as we have some way to express that desire.

{ 0 comments }

If you visit The Huffington Post using Google Chrome, you’ll see this alert bar appear at the top of your screen:

It looks just like a standard Chrome alert, sharing the same coloring, fonts and icons as the browser’s notification bar. But it isn’t. It’s generated by a piece of code on huffingtonpost.com and is just a <div> like any other on the site. There are only a couple of clues to its true nature: unlike a true Chrome alert, it won’t stay at the top of the page when you scroll (surprising, since that’s an easy CSS property to set) and the text of the alert can be highlighted. Finally, most blatantly, the ruse is revealed by right-clicking the alert and choosing “Inspect Element” from Chrome’s menu.

I think this is pretty awful and irresponsible. We live in a time where online fraud and phishing is rampant — malicious attacks in which a  site passes itself off as a different, trusted site in order to fool the user into taking some action. It’s a terrible practice that ensnares millions of people. Usually, such fraud is perpetrated by hackers trying to trick their victims into downloading malware or revealing confidential information. The victim is led to believe that the software they are downloading or form they are filling out is from a site they trust.

And that brings us to The Huffington Post, which is trying to cajole its readers into downloading software by making it look like the download link was generated by their trusted Google browsers! When I first saw the alert, I wondered if Google and The Huffington Post had entered into some sort of partnership, but they haven’t (although the extension in question is a “featured extension” on the true Chrome extension site). Then I wondered if the alert bar was being generated by some suspect third party, but quickly determined it originated from The Huffington Post itself.

I think it’s insane that this idea was implemented. The only good news here is that the software in question is not malicious. But the means by which it is being advertised is fraudulent. The Huffington Post is completely misrepresenting Google and their browser by stealing its look and feel for the purpose of harvesting clicks. At the very least, borrowing the look and feel of another application of site is an infringement of intellectual property. I’m stunned by the lack of commentary on it — either people don’t realize, don’t care or – most likely – haven’t equated this version of phishing with its more dangerous analogues.

And HP isn’t the only one — the well-known site DownloadSquad was fooled by a similar scam at The Independent.

I see no difference between an email spoofing my bank and an web site spoofing my browser. I rely on both to provide me with information that I can rely on, and any attempt to hijack that trust is contemptible. The decision to spoof my browser bar should have been accompanied by a highly-visible disclaimer that the message did not originate from Google or, preferably, been scrapped altogether.

{ 0 comments }

In the last few weeks, I’ve been asked more questions about risk and risk management than I recall hearing in the last year, and at no time has that been more clear than on a day that saw global indices fall 4%. For something we refer to so often, “risk” has proved an elusive concept. Still, it appears every day in the media, not to mention our own conversations. But what is “risk”, exactly?

What is “risk”?

We can’t even begin to discuss risk management without a clear understanding of the underlying concept itself. (To be clear, I’m going to talk about financial risk: that which is associated with a specific investment or portfolio. This includes risk due to market forces as opposed to operational or liquidity constraints.) Many possible definitions of “risk” may spring to mind:

  • The most you can lose on an investment
  • The most you can lose on an investment, with some confidence level alpha
  • The average return of the investment
  • The market value of an investment
  • The notional value of an investment
  • A one-standard deviation loss
  • A six-standard deviation loss
  • The chance that a company goes bankrupt
  • The chance that a counterparty goes bankrupt
  • The chance that you go bankrupt

These are all very useful ideas — we’ll talk about why in a second — but they dance around the issue. They are merely shadows or projections of financial risk. I list them here because ultimately “risk” must be defined in a way which is consistent with all of these projections; in fact it must actually encompass them all. In order to complete that definition, we’ll need to borrow some statistical thinking — but no math, don’t worry.

I propose that “risk” is a distribution of probable outcomes. Specifying “probable outcomes” is somewhat redundant because, in a statistical sense, a distribution is a catalogue of every possible outcome as well as its associated probability. Nonetheless I state it explicitly here because it’s important to realize that we must consider all outcomes, even those which are extremely unlikely.

Risk as a distribution

What does it mean to say risk is a distribution? Put another way, this suggests that if I truly know the risk of an investment, I know the probability of any given outcome. I think that’s a fairly broad characterization which satisfies both the requirement of encompassing the examples I listed earlier and an intuitive understanding of the concept. Volatility is frequently substituted for risk, as investors interpret volatility as uncertainty and risk, when viewed as a distribution, represents uncertainty in future outcomes.

We can now discuss the nature of distributions and their study. In some cases, it’s actually possible to know the true distribution. Flipping a fair coin is the canonical example, but we can also consider rolling a die or drawing a card. In fact, it should come as no surprise that the entire gambling industry is premised on the idea that the public will only be comfortable putting their money at risk if they feel fully informed about possible outcomes. With a coin, there are two outcomes, for argument’s sake let’s say 0 and 1, and each has a 50% probability of being realized. That’s it, we just fully characterized the risk in this investment with a simple Bernoulli distribution. How about the die? There are six outcomes — for simplicity let’s say {1, 2, 3, 4, 5, 6} — and each one has a 16.67% chance of realization. Thus, the risk of the investment is fully captured by a six-part uniform distribution.

Coins and dice are nice illustrations, but they are only toy examples. In the real world, the full list of outcomes may be difficult to ascertain and their respective probabilities even harder. This is where statistics enters the picture. At its core, statistics is the study of distributions. All I’ve received in years of studying is a bunch of tools for analyzing and describing these lists of potential outcomes. If an investment lacks an easily described set of outcomes, we search for clues as to what the underlying distribution could look like. This could include the type of security, its sensitivities to various external shocks, its historical movements, our expectations of the future, etc. From these indications, we can put together an arbitrarily complex picture of an investment’s underlying distribution.

Or at least, we think we can. Creating that picture is a little like trying to draw an object based only on its shadow. In statistics, we refer to this as a hidden or latent factor, or one which can not be observed directly. By sifting the data — the clues — in the right way, we can gain insight into what characteristics the distribution must have and, subsequently, it’s general form.

Choosing the distribution

Many distributions have properties called sufficient statistics. These quantities fully characterize the distribution, allowing it to be perfectly (or sometimes approximately) reconstructed without needing to carry around all the data which originally led to its discovery. Some of these summary statistics lurk in plain sight: mean and standard deviation are two of the most obvious. A dataset which follows a normal distribution, or standard bell curve, can be perfectly summed up with these two quantities. For example, if you made a list of the heights of everyone in your office, it would likely lie on a normal distribution (and for example’s sake, let’s say that is does). If you want to work with that distribution or build any sort of measurement of it, you need to keep a list of all (say) 200 people and their heights.  But if you know it’s a normal distribution, all you need is the mean (average) and standard deviation (dispersion around the mean). Those two numbers give you enough information to know the probability of observing any height in your original dataset, without the need to consult the data itself. They are sufficient statistics for the distribution.

For the coin toss, the sufficient statistic is the probability of 50%, which fully describes the underlying Bernoulli distribution. For the die, it is the range [1,6], which characterizes the discrete uniform distribution in question. When the list of potential outcomes deviates from well-known distributions, we have two options:

  1. Work with the unknown distribution
  2. Approximate the unknown distribution with a well-known one that has similar properties

While it seems like option 1 is the best choice, it can be a dangerous one. Recall that we may not actually know what the underlying distribution looks like; all we have is a picture based on its shadows. If we made mistakes creating that picture, we’ll have trouble making informed decisions later. Moreover, we will likely be stuck with a branch of statistics called “nonparametric analysis” which can be difficult to make good use of.

Option 2 is likely the better choice, provided that we can glean enough information about the underlying decision to make an informed choice for the approximating distribution. There is a tendency to always choose a normal distribution, but I think the anti-Gaussian media has beat that horse to death. Alternatively, there are many families of distributions available; we just want to pick one which describes the investment’s outcomes well while retaining a simplicity that makes any math tractable (and, hopefully, easy).

Option 2 also lets us come up with sufficient statistics for the investment. If all investments were normally distributed, then our portfolio analysis would boil down to their means and standard deviations (and correlations with each other, because the portfolio is a multivariate distribution). This assumption drove the mean-variance finance paradigm that was pioneered by Harry Markowitz in the 1950′s. Today we try to use more sophisticated distributional assumptions, but the idea remains the same: come up with a simple set of numbers that summarize your data and use them to analyze the whole.

Returning for a second to the height example, imagine I asked you to estimate the probability of a colleague being over 6’5″. If you retained the original dataset (option 1), you would start by counting tall people, divide them by the total count and give me your probability estimate. If you used an approximation (option 2), you’d pop the sufficient statistics into a well-known and exhaustively studied equation and know immediately not just the probability but also a measure of confidence in that number. More complicated analyses might be simply impossible without the distributional assumption. When we are unsure of the best approximation, some compromise of options 1 and 2 will result.

It’s very important to note that in describing the distributions or risk of these investments we made no judgments about quality. Surprisingly, we can’t even say whether they are “risky” or “safe”! Despite my claiming that “we know the risk of the investment,” all we’ve done is describe the outcomes; subjective and qualitative assessments are yet to come.

Risk as a metric

Once we have some idea of what an investment’s distribution of outcomes looks like, we have identified its “risk”. But as I’ve mentioned, we can’t yet do anything with that information. We need to create some sort of measurement that allows us to make comparisons and decisions. Risk metrics are those measurements.

Risk metrics are usually summary statistics of the underlying risk distribution. Summary statistics give information about the distribution, but, unlike sufficient statistics, they may not provide enough detail to recreate the distribution entirely. For example, the mean by itself or the standard deviation by itself or the minimum value all give some insight into the distribution but fail to characterize it completely. Frequently, estimates of these summary statistics are the “shadows” from which a picture of the true distribution is formed. When you measure the heights of everyone in your office, the observed mean and standard deviation constitute two of the clues you would use to construct the representative bell curve.

We have now learned enough to understand that the risks I listed earlier were actually summary statistics of an investment’s true distribution, or underlying risk. At the risk of redundancy, here they are again with explanations (note that some of these return to the distribution of returns, others to the distribution of portfolio values; it is easy enough to convert between the two):

  • The most you can lose on an investment (the minimum of the distribution)
  • The most you can lose on an investment, with some confidence level alpha (the 1 - alpha quantile of the distribution, also referred to as Value at Risk)
  • The average return of the investment (the mean of the distribution)
  • The market value of an investment (the most recent observation from the distribution)
  • The notional value of an investment (the minimum or maximum of the distribution)
  • A one-standard deviation loss (the standard deviation of the investment)
  • A six-standard deviation loss (the standard deviation of the investment)
  • The chance that a company goes bankrupt (a specific outcome from the distribution and its associated probability)
  • The chance that a counterparty goes bankrupt (a specific outcome from the distribution and its associated probability)
  • The chance that you go bankrupt (a specific outcome from the distribution and its associated probability)

It is clear that without knowledge of the underlying distribution, none of these quantities can be known. I want to hammer home the difference between knowing risk, the distribution, and risk metrics, summary statistics of that distribution. The distinction is even more important — and confusing — because sometimes the summary statistics are observed first and the distribution is inferred thereafter.

I mentioned earlier that volatility is frequently used to describe risk, because of its tie to uncertainty. We can now view it as just one more summary statistic (specifically, standard deviation). However, volatility has a special place in the risk paradigm because it was explicitly labeled as such in the mean-variance paradigm (it’s counterpart, return, is played by the mean). That legacy has held and is in many ways justified: more stable returns (less volatility) are associated with return distributions that are well-known and usually characterized by a lack of large losses. As volatility increases, the probability of losses generally increases as well. The distribution becomes more dispersed and various risk metrics take turns for the worse. Thus, volatility is a risk bellwether: easy to calculate and usually indicative of most other metrics.

(Another way to think of risk metrics is as low-dimensional projections of the underlying (and potentially high-dimensional) distribution.)

Choosing the metric

And now I’d like you to forget everything we just discussed. In practice, when we talk about “risk” we’re referring to risk metrics rather than the underlying distribution. The reason for that is pragmatic: what good does it do to tell someone what the distribution is? Returning to the heights example, knowing the distribution doesn’t give you any answers. In fact, if you’re a statistician it probably gives you a bunch of questions. Summary statistics (and more advanced results) provide answers. They take the large risk distribution and condense it into a useable form. The appeal is clear: I could tell you every possible outcome of the stock you’re about to buy, or I could tell you that you’re 90% likely to never lose more than 20%. Which is more useful (putting aside all arguments of whether the latter can truly be known)?

So when we talk about risk we’re talking about metrics. How do we choose those metrics? Well, if part 1 of the risk manager’s job is to model the underlying distribution, then part 2 is deciding which metrics are useful and calculating them. Needless to say, this part is more art than science. Contrary to popular belief, there is no magic number that contains all risk information and lets you make investment decisions without further analysis. You may have heard of these holy grails, they go by names like “value at risk”, “Sharpe ratio”, “Sortino ratio”, “return over maximum drawdown”, “omega ratio”, and so forth. These are like weight loss pills — they make promises grounded in just enough math to either convince or confuse (depending on the customer) and appear to work as advertised on the surface. Caveat emptor.

We have already learned why there is no “one number” solution: because risk metrics are summary statistics and not sufficient statistics. Now, even if they were sufficient statistics for the risk distribution, there still wouldn’t be a silver bullet, because the risk distribution does not allow qualitative judgments. It is merely a list of outcomes. If you could condense it to one number, you’d have a number that represented all your outcomes, good and bad, and not necessarily one which would provide an indication of value.

What’s really necessary is to look at many of these metrics together. Each one provides some information about the risk distribution, like various shadows from different light sources. By considering many of them at once, our understanding of risk (and equivalently, our picture of the underlying distribution) is enhanced.

There are a couple risk metrics which are always useful.

  • The most you can lose is an important one: investors need to bear in mind that zero is a real possibility. For most cash investments, this will be equal to the market value of the investment. Why isn’t this enough? If you bought a million shares of stock and sold a million puts on the same, the max loss on the stock would be greater than that of the options, and you might conclude that the stock was the riskier play. However, I don’t know anyone who would agree that buying stock is riskier than selling puts. We reach that conclusion by considering other outcomes of the respective distributions, or other summary statistics.
  • A reasonable upside estimate is also key. This may not fit the traditional intuition behind a “risk measure”, but it would help differentiate between the stock and option portfolios just described. The stock has large potential for gains; the puts are capped. Thus, the downside in the stock is mitigated by the positives but the put’s downside — though almost equal to the stock’s — is not similarly offset. The decision of what constitutes a “reasonable” upside is in the art category rather than science, so unfortunately I can’t provide a algorithm.
  • An understanding of an investment’s volatility. Volatility, as mentioned, is like a risk bellwether. As it increases, so does the uncertainty about the future outcomes. Another way to express this idea is to say that the entropy of the risk drops as the volatility increases (this idea hasn’t been explored nearly enough in the literature). Popular metrics like the Sharpe ratio try to capitalize on this idea by expressing the “return per unit of risk [volatility]“. Presumably, the more risk one takes through an investment, the greater the return that should be received. (This notion took a turn for a disaster when, in late 2008, angry investors wondered why they lost money in stocks as compared to bonds — the answer (that stocks are more risky) was staring them in the face, but they were accustomed to that risk resulting in greater yields and refused to accept any alternatives.)
  • Event-driven idiosyncrasies. Is your investment subject to legal/regulatory risk? Operational risk? Other highly-targeted risks unique to that security? If so, the risk distribution becomes much harder to estimate accurately because these characteristics distort it to the point that approximations fail to capture it fully. It is important to understand not only what these idiosyncrasies are, but how they can impact your estimates of risk. As a simple example, consider an illiquid stock which doesn’t trade except for a few times a year, when it jumps up or down 15%. Any distributional assumptions should be tossed out the window here; stick with more “nonparametric” qualifications like maximum loss and rely on an excellent understand of the risk specific to the investment.

No discussion of risk metrics would be complete without addressing value at risk. Value at risk, or VaR, was once a celebrated risk metric, introduced to the public by J.P. Morgan in 1994. More recently, it has become demonized and blamed for its contributions to excess risk-taking and the collapse of many financial institutions. VaR has a clear definition: it represents a level of returns which will only be exceeded some percent of the time, 5% or 1%. In a strict statistical sense, VaR defines the beginning of a distributions tail. Unfortunately, it provides no information about what happens when returns actually exceed VaR and make it into the tail. As more financial institutions came to see VaR as a minimum return, rather than an unlikely-but-still-possible return, they increased the level of risk they were willing to accept. On days when returns exceeded VaR — and they tended to do so by quite a bit — those institutions took losses far greater than they ever anticipated were even possible. In other words, they failed to consider that the risk distribution extended past the VaR level.

In a statistical sense beyond the scope of this writing, VaR does not satisfy certain axioms that good risk metrics require (see Artzner’s 1999 paper on coherent risk measures). Nonetheless, when used in compliance with its strict definition, it serves as just another summary statistic and can give limited insight to the risk distribution. It is useful to observe the evolution of VaR over time, for example (if VaR increases, risk is increasing, even if the absolute level of VaR is uninteresting). Extensions of VaR like expected shortfall (the average loss, conditional on that loss exceeding VaR in the first place) are also quite useful. An institution is not doing something “wrong” by calculating a VaR; it may be a red flag if they rely solely on the number, however.

The risk management process

What I’ve laid out here is a rather dry blueprint of the risk management process. The procedure is initiated by searching for clues to an investment’s underlying distribution. This could be any combination of quantitative (historical or modeled outcomes) and qualitative (fundamental analysis, opinions about the future) factors which provide the “shadows” of the distribution. From these, a complete picture of the distribution is constructed, either through the use of sufficient statistics or tailored models (if the distribution defies simple approximation). Finally, the distribution is used to generate risk metrics that allow investments to be assessed and compared. Those outputs become a critical input for the investment process, as decisions must be made in the context of the portfolio risk, and that risk must not be outsized relative to expected returns.

Once the investment is made, the risk manager will continue to exert influence on the portfolio distribution. For example, if the left tail becomes too big, he may take steps to reduce it by taking offsetting positions, or hedging. If exposure to a specific market force (such as interest rates, or currencies) becomes too large or too small, he may buy or sell securities to bring it back in line. This monitoring process is very important — the risk of an investment continues to change long after the investment is put on (in fact, you should hope it does, for otherwise nothing has happened at all!)

There are a few key lessons that can be taken from this process.

  • First, an appreciation for the lack of a silver bullet: there is no magic risk number that will protect your portfolio. I’m sorry.
  • Second, a grasp of the constantly changing nature of an investment’s risk. There is no “set it and forget it” in this process.
  • Third, an understanding of noise vs signal: investments will tend to sample from all over their distributions, both on the upside and down. It is important to observe whether or not the observed returns (themselves summary statistics, or “shadows”) match your understanding of the underlying distribution. If they deviate too much, be prepared to consider that your original assumption was wrong and start over.
  • Fourth, but most important, an understanding that the forest must not be lost for the trees. Seizing on one or two risk measures will inevitably lead to ignorance of the complete distribution (with possibly disastrous consequences). Conversely, trying to compute every summary statistic there is will lead to information overflow and indecision. Risk metrics are tools which provide insight; there’s a healthy balance between sparsity and indulgence. Thinking of the metrics as shadows from different lights really is a useful metaphor: too few and some details won’t be resolved; too many and the data’s redundancy will overwhelm any chance of learning from it.

Aside from these tips, I can’t stress enough the importance of practicing good risk management. Many investors do it implicitly, as simply understanding each investment is usually tantamount to intuiting its distribution. It doesn’t have to be a burdensome regime of additional steps, though many investors will find it useful to ask themselves, as an exercise, “What is the largest loss I can sustain and what is the likelihood of that event? What is the volatility of my portfolio, and am I earning enough to justify that allocation?” and so forth.

The risk management process is not unlike solving a puzzle by piecing together clues and constantly checking that the emerging picture matches up with expectations. I hope this explanation has been satisfactory and not too mathy (you don’t want to see me when I’m mathy). There’s a richness to the process which I’m afraid I won’t be able to describe here — for your sake and mine — but I think this should serve as a good jumping-off point for further discussion.

In conclusion, the Hitchhiker’s Guide to the Galaxy has this to say on the subject of tail risk:

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.

{ 6 comments }

No cash for clunkers

June 28, 2010 in Finance

I’m a big fan of Emanuel Derman’s work, his book and, most recently, his blog. In his latest post, Derman shows his lighter side with an entry in a contest for “plithy personal ads”:

FANNIE MAE with troubled assets, bored with Freddie Mac, seeks well-regulated stimulus package from counterparty too big to fail. No cash for clunkers.”

His submission won an Honorable Mention.

{ 0 comments }

James Surowiecki writes an excellent piece for the New Yorker on the state of financial illiteracy, concluding 1) we have it and 2) we have to get rid of it. Ultimately, he concludes that some form of basic financial education should be mandated before people can partake in purchasing financial products.

The government’s new consumer-protection agency has the authority to “review and streamline” financial literacy programs, but that’s not enough. We really need something more like a financial equivalent of drivers’ ed. There’s evidence that just improving basic calculation skills and inculcating a few key concepts could make a significant difference.

This point is particularly frightening:

Critics also argue that financial education may make people overconfident, and therefore more likely to make bad decisions. In fact, the reverse is true: the less people know, the more overconfident in their abilities they tend to be. In a German study, eighty per cent of those surveyed described themselves as confident in their answers on a questionnaire, yet only forty-two per cent got even half the questions right. This is known as the Dunning-Kruger effect: people who don’t know much tend not to recognize their ignorance, and so fail to seek better information. No wonder, then, that the least knowledgeable people in the Atlanta Fed study were also the least likely to do research before getting a mortgage.

I’ve thought for some time that every expansion of financial markets has led to a market crash as uninformed users of new financial innovations are caught unawares when the floor falls out. This supports that theory — the democratization of markets (through margin trading, portfolio insurance, pooling funds, ETFs, online trading, etc. etc.) lets in new investors who are overconfident and underinformed. Worse, they don’t seek to correct their lack of knowledge in any way. Market expansions are certainly not bad things in and of themselves, but the people marketing those innovations must feel some obligation to ensure that the newest wave of market participants are  prepared for whatever may come their way.

{ 0 comments }

As if responding to my thoughts on communicating with machines, Isaac Asimov’s classic novel Second Foundation provides the following:

Speech, originally, was the device whereby Man learned, imperfectly, to transmit the thoughts and emotions of his mind. By setting up arbitrary sounds and combinations of sounds to represent certain mental nuances, he developed a method of communication — but one which in its clumsiness and thick-thumbed inadequacy degenerated all the delicacy of the mind into gross and guttural signaling.

[...]

Grimly, Man had instinctively sought to circumvent the prison bars of ordinary speech. Semantics, symbolic logic, psychoanalysis — they had all been devices whereby speech could either be refined or bypassed.

(I’m taking this half-seriously.)

{ 0 comments }

The NYT has published the second article in their “Smarter Than You Think” series on artificial intelligence (TGR covered the first here and again here). This time, the focus is on speech recognition and natural language processing.

A couple passages really stood out to me in this more abbreviated overview of the technology:

Computers with artificial intelligence can be thought of as the machine equivalent of idiot savants. They can be extremely good at skills that challenge the smartest humans, playing chess like a grandmaster or answering “Jeopardy!” questions like a champion. Yet those skills are in narrow domains of knowledge. What is far harder for a computer is common-sense skills like understanding the context of language and social situations when talking — taking turns in conversation, for example.

Today’s artificial intelligences are extremely narrow in scope. That’s not a bad thing, it’s part of the development process. To draw a hardware analogy, we don’t yet have a “complete” robot, but we do have lots of robots that are very good at small tasks: walking, running, grasping, lifting, expressions, recognition, speech, etc. The challenge in both spheres will be to construct a gestalt device capable of doing all things well. Until then, I’m afraid C-3Po will remain fiction.

A machine capable of complete interaction with our world will draw from a host of intelligence systems — and will have to incorporate some form of meta-intelligence in order to make sense of them all. Sony’s PlayStation 3  has a “Reality Synthesizer” chip, and though the current tech doesn’t quite live up to its name (marketing is what marketing is, after all), future generations of smart machines will indeed need processors that can produce complete characterizations of the real world.

There’s also a note in line with my observation yesterday that AI is very literally in its infancy:

The AT&T researchers worked with thousands of hours of recorded calls to the Panasonic center, in Chesapeake, Va., to build statistical models of words and phrases that callers used to describe products and problems, and to create a database that is constantly updated. “It’s a baby, and the more data you give it, the smarter it becomes,” said Mazin Gilbert, a speech technology expert at AT&T Labs.

Finally, there’s mention of people adjusting their speech to address the computers:

Some callers, especially younger ones, also make things easier for the computer by uttering a key phrase like “plasma help,” Mr. Szczepaniak said. “I call it the Google-ization of the customer,” he said.

This is really interesting. While it is no doubt important for speech recognition software to handle everyday speech, I believe that in the future we will interact with computers “differently” than we do with people. This will be for convenience more than anything else; some part speed and some part efficient phrasing. I don’t type natural language queries into Google; I type a series of keywords that best represent my queries. I’ve learned what sort of keywords get the best search results through experience. In a sense, I do Google’s parsing for it — I choose the most statistically interesting words and present those (no need for “the” or “is” or other words unlikely to enhance my results). Can I imagine a fully natural-language Google? Of course. But I’d still (if possible) just give it the fragmented keywords. Why waste the time and risk confusion?

I know that I look like an idiot – I write these posts about how amazing artificial intelligence is and how it’s going to change everything, and then I insist that we will still treat it as if it were “stupid,” using keywords instead of complete sentences. It’s a matter of efficiency. Until the gestalt computer is born (and I think that’s a long way away), then we will have to continue to subsidize each AI’s weaknesses with our own intelligence. I think Google does a great job of retrieving search results; I’m not that impressed with its natural language parsing. Therefore, I do the parsing myself. This is why I think it’s silly that the NYT article mentions programming virtual assistants to ask about the Mariners game — that conversation is doomed to be unsatisfying. The assistant is primed for speech recognition, not speech generation — it can only respond with relatively few predetermined phrases. Unless I’m asking for Ichiro’s batting average with runners in scoring position in the second half of the game, I’m not going to get much utility out of a speech recognition device. A machine capable of holding a conversation is yet a step further away. And a machine capable of faithfully executing spoken instruction (without a set of preprogrammed directives – thank you very much iPhone voice control) is yet to be conceived.

But lest I sound like an AI bear – I couldn’t be happier that the NYT is running this series and I’m looking forward to part three.

p.s. The comments on the article read like a collection of the most paranoid, tin-hat, anti-machine delusions I’ve ever had the displeasure of reading. The educational push it’s going to take to get society to embrace artificial intelligence is significant… and we thought CDO’s were a tough pill to swallow!

{ 0 comments }

The language of statistics

June 24, 2010 in Data

Joseph Rickert has written a piece calling R “the language of statistics,” which I feel is a deserved title. As he puts it:

I don’t just mean that R “is spoken” by many or even most statisticians. R’s superiority for statistics is deeper than that. R is a language with syntax and structure that have been explicitly designed to formulate expressions about statistical objects. At this time, it may be le premier langue for statistical thinking that enables the formulation of ideas, and notions about statistical models and data that are difficult to express succinctly in other languages including mathematical notation.

Unfortunately, the rest of the article is a bit difficult to swallow unless (or, in my case, even if) the reader is well-versed in R. The examples are a bit too complicated or special-case to really demonstrate the language’s power and scope. That’s not to take away from the message, but I worry that it will be lost on its target audience.

Having defended the choice of R to colleagues in both academic and professional roles (and personal ones too, in a few especially nerdy cases), I’ve found that the entire pitch rests on the richness of the syntax “clicking” in someone’s head. Otherwise, the whole idea is shot down by a few “Why wouldn’t I just use program X?” rejoinders. Ultimately, I’ve found that the multi-disciplinary aspects of R are what seal the deal — there’s no need (usually) to recast datatypes or write new procedures, because it’s almost guaranteed that someone out there has done that work for you (in a succinct and compatible way, to boot!).

R should take a cue from Apple’s ads: Do you need to run summary statistics, followed by a linear regression, followed by a cluster analysis, followed by a genetic exploration… and then a series of publication-quality graphs? There’s a package for that.

{ 0 comments }

I recently wrote about IBM’s Watson — a machine capable of competing against humans at Jeopardy!. The machine represents a pretty phenomenal leap in artificial intelligence, as it parses key bits of information out of natural language queries in real time. But here come the detractors, the “humanists” who are either too scared or too closed-minded to aknowledge the magnitude of this accomplishment.

In a piece called “What’s So Great About IBM’s Jeopardy!-Playing Machine?”, Niraj Chokshi argues that the machine isn’t actually “thinking” at all:

Rather than develop a machine that can decipher semantics — what do these words mean and how do they relate? — IBM took a shortcut of sorts and developed a high-speed computer that “thinks” in probabilities….

That’s quite a feat, but it’s not emulating human thought. It has side-stepped the problem altogether, relying on massive computing power and storage, as well as probabilistic number-crunching to approximate how we parse language.

And so we get to my favorite part of artificial intelligence – the philosophy. Too often, this element gets ignored or brushed aside but it really is fascinating. As we create devices that can “learn” (defined loosely as developing a consistent response to a stimulus) and “think” (combining stimuli – known or unknown – into new responses), we must investigate parallels with our own minds. Just as we learn about our own physiology from “lesser” organisms, so can we infer our mental processes from relatively simple models of learning.

But before we can address philosophy, let’s talk physics. Let’s think about Watson from the persepective of our own brains. Despite Chokshi’s claims, IBM isn’t “cheating” by using massive computing power. On the contrary, they’re playing catch up.

Estimates of the human’s brain capacity range from 1-1000 terabytes. It’s a wide range, to be sure, but for reference consider that the entire Library of Congress is estimated to contain only 20 TB of text. One-terabyte computer drives were only introduced three years ago; a cutting-edge supercomputer like Watson’s Blue Gene hardware might employ a database ranging from 600-1000 TB, on par with a human brain. A Blue Gene implementation scheduled for next year will use 1,600 TB.

But hardware alone does not a brain make! We still need software to access and interpret the data, as well as compression/access algorithms to store it efficiently. In this regard, modern technology doesn’t even begin to approach the human brain. Our compression algorithms are without peer – the brain stores only what it deems important and disregards extraneous details. Witness too how difficult it is to recall a perfect memory. It is well known that the brain uses cues to “fill in” what it doesn’t retain explicitly. So we may have the same storage capacity as Watson, but our use of that space is much more efficient, making it possible for humans to store far more data than the raw numbers suggest.

For further evidence, consider that Watson occupies thousands of square feet – your brain, needless to say, fits comfortably within your own head.

And how about speed (“massive computing power”)? Ray Kurzweil estimates that the human brain operates at about 20 petaflops (or 20 quadrillion operations per second). Blue Gene was designed to operate at petaflop speeds but currently sustains 0.5 petaflops. The latest computer to be designed has a theoretical speed of 20 petaflops, but will take years to reach that mark – and will not go into production until next year. So the human brain is operating at speeds an order of magnitude faster than IBM’s supercomputer, even before its superior access algorithms and storage capacity are utilized.

And as final evidence for the brain’s superiority, consider this metric of efficiency: your brain operates on about 20 watts. The next (and most power-efficient) Blue Gene supercomputer will draw 6 megawatts. That’s 6 million watts, or 300,000 times as much power as your brain.

Kurzweil has a nice graph that sums up the path of computers to human brain-like power:

Note that “Human Brain Functional Simulation” implies that the inner workings of the brain can be run at full speed – it does not mean any algorithms or intelligence will be present (presumably that must wait until 2025).

It’s silly to claim that Watson out-classes humans in speed, storage, or “smartness.” It is actually at a disadvantage because it lacks algorithms sophisticated enough to decide, at acquisition time, what information is important enough to retain and what can be ignored. On top of all those physical advantages, humans have some sense of “meta-learning” which we don’t yet know how to implement in hardware.

So we can turn to the second part of Chokshi’s claim – “probabilistic number-crunching [that] approximates how we parse language.” I wonder how he thinks we parse language, if not probabilistic number crunching? Like it or not, our brains are networks which pass and process information – they’re doing “math” in a very computer science-esque manner of speaking.

How does a computer know when it is right? It judges its response to be the most likely answer. Do humans act any differently? That’s “probabilistic”. How is that judgement performed? The answer is compared to other possible answers – perhaps faster than we are aware, but obviously the comparison takes place. As humans, we are fortunate to have replaced neurons firing with “sensations”; we don’t perceive that “x,y and z neurons fired – therefore “red” is the right answer.” Instead, we are overwhelmed by a sense of “red” and answer in kind. Computers don’t get the benefit of those “sensations.” They really do say “x,y and z factor receptors fired – therefore the answer is “red”". Masked or not, that is a “number-crunching” operation. (“Number-crunching” is rapidly becoming the crutch of people who don’t want to bother with understanding an underlying process. The term’s ubiquity is indicative of the prevalence of math in our daily lives.)

The actual mechanics of artificial intelligence are a bit beyond the scope of this article – but it’s safe to say they mirror cognitive processes closely, and a “thinking computer” isn’t too far off from a “thinking human” in terms of process and execution.

We could sum up this entire post by simply writing: “Your brain is an extremely powerful, extremely efficient supercomputer coupled with an extremely optimized unsupervised learning program.”  But for some reason, that point of view is resisted by “humanists.” Is it fear of Skynet? Fear of playing God?

In order to truly understand these algorithms, we have to forget everything we think we know about computers as mindless data processors. These aren’t calculators, where an input leads strictly to a deterministic output. These are actual thinking machines which, given a set of input stimuli, weight and evaluate a set of likely outcomes — in some cases hundreds of times over — to produce informed guesses which, depending on the algorithm, might not even be the same each time it sees the data! Data processors (or calculators) require instructions at every step; thinking machines figure out the rules on their own. Make no mistake — computers are best at floating point operations (read: number-crunching); but artificial intelligence relies on moving away from simple deterministic outcomes and leveraging those fpo’s in a wildly different context.

In my mind, Watson is an incredible accomplishment. Moreover, it’s just a child – it’s intelligence is measurably a fraction of our own. In the coming years, these brains will become commonplace and we won’t fear them as competition – but rather as assistants. In the meantime, let’s not forget that the technology is in its infancy and WILL make mistakes – often ones which seem bizarre to our practiced brains.

Chokshi’s final argument takes a specific example which the computer got wrong and actually tries to make the argument that on that basis, the entire program is flawed. It’s a flawed tactic, to be sure, but since he claims that it is evidence of the computer’s failure, it seems reasonable to assume that he believes the reverse would be true of humans. Are we really so confident to think that just because information is “in our database”, we won’t give a wrong answer? If that were true, then humans with access to Google would never get a Jeopardy question wrong – as they would always, one way or another, be able to locate the “correct” information. Intelligence is not merely information storage and retrieval; it is the process of data interpretation.

In a recent dinner conversation about the philosophy of artificial intelligence, a friend pointed out that one of the theories I was discussing was almost a direct retelling of Plato’s theory of Forms. The idea was surprising, but reinforces the idea that artificial intelligence researches are not inventing a new discipline as much as they are trying to reinterpret a set of biological algorithms with modern hardware. I’m very excited to see what’s next.

 

{ 2 comments }

Here’s an amusing chart showing the percent of stocks that sell-side analysts have rated “sells”, on average:

There’s a million junk-chart bloggers who will tell you how much is wrong with this graph (myself included) – starting with the left hand scale, which should go up to 10% rather than 100%. But in a rare twist, the left hand scale is the graph. The message here is that “sells” are a minuscule component of the whole universe, and that the SarbOx legislation did little to affect that proportion. The tall left hand scale and bold “Enactment” line highlight the incongruity of the rest of the graph.

(via Paul Kedrosky)

{ 0 comments }