From an NYT article on Google’s translation services, this excerpt sums up the most critical transition in machine learning that has happened thus far:
Creating a translation machine has long been seen as one of the toughest challenges in artificial intelligence. For decades, computer scientists tried using a rules-based approach — teaching the computer the linguistic rules of two languages and giving it the necessary dictionaries.
But in the mid-1990s, researchers began favoring a so-called statistical approach. They found that if they fed the computer thousands or millions of passages and their human-generated translations, it could learn to make accurate guesses about how to translate new texts.
LOST is back tonight! And what better way to prepare than an interactive timeline from the excellent NYT graphics team? A good infographic should communicate otherwise-complex ideas in a simple and intuitive manner… oh, never mind, LOST is back and that’s really what matters. Check out the timeline here!
A beautiful article in the NYTimes contrasts abstract mathematics with the chilling reality of the Mexican drug cartel wars:
I was born in Mexico City, in a world that seems less and less familiar to me. I live now in the opposite corner of the continent. I am training to be a political scientist at Harvard. My passion has remained the afflictions of my homeland, but at Harvard I have found new ways to address them, to use mathematical models — matrices, vectors, equations, regressions — to understand the Mexican drug crisis.
The cartel wars are extremely violent, and the gangs are responsible for reprehensible kidnappings and deaths. They rank among the most deadly periods of organized crime in human history. The author’s goal isn’t to explain how she can analyze the wars from up in an ivory tower; it’s to describe how her mindset and toolkit inform her understanding of the world in any situation.
The article captured me because it never mentions what the author actually models. Instead, it presents her frightened thoughts and her efforts to calm herself by looking at the world through a mathematical lens. But it’s not what you think; there are no emotionally-distant mathematicians here. The author communicates her fascination with tying reality to abstract models, expecting and preempting the protest that reality is too complex and math too simple:
In this violent world, with the man in the blue Chevy whispering at me behind the window, math is my shield. Speaking up about drugs is in these parts a dangerous game. But not if you speak in the language of sigma and conditional expectations. Math protects me from the immediacy of the violence, and it protects me from them.
The beauty of my method lies in its simplicity. With mathematics I’m able to codify and simplify reality to make it manageable and, more important, malleable. I represent each possible individual as an equation in which each term symbolizes tastes, goals, profession and abilities. All people get portrayed: Policemen, politicians, citizens and drug cartels start living in this mathematical world as planes and hyperplanes and, as in real life, they interact and affect one another, sometimes colluding, sometimes colliding, sometimes neither.
I then use optimization to predict the form of interaction that will be the most probable to emerge and remain over time. Math starts speaking. It tells me, for example, under what conditions the outcome would be a drug war; when would the government prefer to cooperate with cartels; or when cruel intra-cartel purges will become the norm.
There is a part of every modeler’s mind which is constantly teasing out variables from constants. The statisticians among us may take a frequentist view, and wonder what would happen if a scene played itself out a million times; the programmers will deduce the underlying algorithms from the fuzzy result; the pure mathematicians will see manifolds everywhere:
In this abstract microcosmos, reality can be frozen or just slightly changed. I move and look at my hyperplanes from different angles. Let’s change the penalty code. No, let’s increase patrolling. Or reduce wages. Allow less contact between policemen and dealers. Assume the police force is corrupt. Assume it is not. I solve the equations and there it is. My answers come as Greek letters and probabilities.
But we all admit:
I know, I know, this is weird.
Ultimately, “free will” becomes the clarion of the independent. At least, it’s the best response to this explanation:
It may seem strange to examine this shadowy world with equations. But mathematics is transforming the social sciences. In the same way that physicists can predict the movement of atoms in space, we can use mathematics to model how individuals and groups will make decisions and interact in a society.
But free will has a (somewhat tentative) analogue in Heisenberg’s uncertainty principle, and with that philosophy and math (or theology and physics) are combined — but there’s been plenty of pop-sci written on that topic.
I found this brief article remarkable in how it was able to demonstrate the overlay mathematical thought on an extremely “human” subject without ever needing to explain either one.
That’s news in and of itself. Once upon a time, system requirements (at least, ones that anyone paid attention to) were strictly for high-end professional software, cutting-edge games and the like: software that actually needed powerful hardware. But the real news here is that Office 2010 requires a DirectX-compatible graphics card.
Now, I don’t think Word is going to be offloading word counts to a GPU anytime soon. But Microsoft’s announcement is making waves nontheless — and I think it’s actually great. It means we’ve reached a point where our computing history is so mature thateven our mass-market word processors have achieved a level of sophistication that we need to make sure of their compatibility. That’s exciting!
Certainly, Excel is an obvious candidate for hardware acceleration, which, besides accelerating simple tasks like opening large files and parallel tasks like running many equations, could finally bring true vector operations to the versatile software.
But there is bad news. I’ll let Microsoft break it to you:
If your computer has a GPU, it lets us perform graphics rendering tasks (like drawing charts in Excel, or transitions in PowerPoint) in the GPU instead of in the CPU, which parallelizes work and speeds up performance. This is particularly relevant for users of PowerPoint 2010, which will introduce some awesome new graphics and video integration features (more info at the PowerPoint team blog).
Yes, the true motivation behind the graphics upgrade is supercharging those awful 3D pie charts we know and despise.
(If you click the PowerPoint link, you’ll notice that Powerpoint 2010 looks a lot like Keynote. Just sayin’.)
I especially love “The HDR Hole.” Presumably the y-axis is measured in percent of personal potential… there must be all sorts of Bayesian self-reflection stuff going on there.
Air Force drones collected nearly three times as much video over Afghanistan and Iraq last year as in 2007 — about 24 years’ worth if watched continuously. That volume is expected to multiply in the coming years as drones are added to the fleet and as some start using multiple cameras to shoot in many directions.
A very interesting read for the dataheads among us. The comparison to football broadcasts also caught my eye – televised sports are so frequently compared to battles and war, and here we see the army coming to the athletes for advice:
But while the biggest timesaver would be to automatically scan the video for trucks and armed men, that software is not yet reliable. And the military has run into the same problem that the broadcast industry has in trying to pick out football players swarming on a tackle.
So Cmdr. Joseph A. Smith, a Navy officer assigned to the National Geospatial-Intelligence Agency, which sets standards for video intelligence, said he and other officials had climbed into broadcast trucks outside football stadiums to learn how the networks tagged and retrieved highlight film.
Alex Lundry, Vice President and Director of Research of the consulting firm Target Point, has published a brief talk called Chart Wars which is simply brilliant, serving as an excellent but brief (5 minutes!) overview of how easy it is to manipulate infographics and what tricks to be wary of. His specific focus is a chart (which was covered on TGR previously) whose designs – and it went through many iterations – were politically motivated. While there is no doubt about which charts are more clear, his implicit question – which charts are right? – resonates philosophically.
Walmart is running ads right now which claim that shoppers who spend more than $100 per week at the supermarket would save $650 a year by purchasing their groceries at the giant retailer instead.
That’s quite a jumble of conditionals and varying metrics: you have to first meet the requirements of shopping at a supermarket and spending over $100 per week; the savings are then presented in a completely different timeframe of one year. That works out to $12.50 a week, or a still sizable 12.5% discount.
Why not present it as 12.5%? The simple answer is that “$700″ is a substantial figure, and the marketing folks wanted to make people feel like they were saving more; conversely, $5200 a year on groceries sounds like a lot – better restate that as $100 per week. Depressingly, it occurs to me that many Americans may not know what to do with percentages.
Another key point is found in the wording of the ad – why target shoppers who spend more than $100 a week? If Walmart’s prices are really lower, then all shoppers should reap a benefit, not just the high rollers. Since I do not think Walmart is price discriminating (offering discounts only to people spending more than $100), I have to conclude that they restricted their dataset to increase the dollar value of the average person’s savings. If every shopper saved 12.5%, then the average annual dollar savings per person might be, say, $250. But if we consider only people who spend more than $100, the average dollar savings jumps to $650 even though the percent savings remains 12.5%. I would guess that $100 was chosen as a cutoff because a) it’s a round, friendly number which b) creates a relatively high average dollar savings while c) remaining low enough to be in reach of many American families. This, of course, is further evidence that Americans don’t understand percentages well (or at least, that marketers think they can fools us by avoiding them).
Note also that all of my calculations use the stated minimum figure of $100 vs the average figure of $650 to get the 12.5% discount. That’s not a real discount – someone spending $100 wouldn’t get $650 in savings, as that is the average of all the people spending more than $100. That person would realize a smaller dollar savings, and the real discount rate must therefore be less than 12.5%.
We all know that you can get some funny/interesting responses by typing the first part of a question into a major search engine’s search box and letting it suggest the remainder. The NYT has gone so far as to investigate those suggestions themselves. I particularly enjoyed their description of search engines as “modern confessionals:”
This labor-saving device — part fortuneteller, part shrink? — has opened a window into our collective soul. With millions of people pouring their hearts into this modern-day confessional, we get a direct, if mysterious, glimpse into the heads of our fellow Web surfers.
And some nice visualizations of the questions people are asking don’t hurt, either:
I’d love to see an interactive tool for creating these diagrams.
Nate Silver writes about the dropping cost of air fares – yes, you read that correctly – over at Five Thirty Eight. His writing, as always, is excellent – I only want to point out a chart he uses and how it can be dangerous to draw conclusions at a glance (or, if you prefer, [...]
The NYT has published an infographic showing the top recipe searches on Allrecipes.com. Searches are broken out by state, allowing some interesting comparisons. (Local dialects and preferences are an interest of mine, and when combined with maps I can’t resist… see also various words for soda.)
Here’s the chart for “apple pie”, the 5th most popular [...]
Via FlowingData, I found this amusing pie chart from a local Fox News broadcast:
The survey plainly allowed people to give more than one answer, resulting in responses that were not mutually exclusive. It’s tiresome but bears repeating: pie charts are only suited to data which adds up to 100% (and then, only if there are [...]
For a while, I’ve been following development of Indiemapper, a forthcoming web tool from the folks at Axis Maps. It should allow for easy map creation, including – yes – choropleths galore. However, the data analytics that will be available remain to be seen.
The WSJ asks, “Is It Time to Retire the Football Helmet?” With the debate about football head injuries and CTE swirling, some are wondering if wearing helmets is actually exposing players to greater danger than if their heads were exposed. Though seemingly counter-intuitive, the argument follows well-established moral hazard reasoning that some have perceived in, [...]
Psychologist Daniel Wright has published a list of ten statisticians every psychologist should know.
The list is comprised of The Founding Fathers:
1. Karl Pearson – who established statistics as an academic discipline
2. Ronald Fisher – who developed much of statistics’ mathematical foundation, including ANOVA and maximum likelihood, and the importance of p-values
3. Jerzy Neyman [...]
Ben Fry has created a stunning image consisting of the 26 million roads in the United States (click to zoom):
Nothing other than asphalt (gravel, dirt…) has been drawn here, but geographic and political features emerge nonetheless. In a very real sense, the geography is a latent feature of the roads dataset, as it creates boundary [...]
Recently, there have been countless ads for auto insurance all making a similar claim: drivers who switch to that firm save significant amounts of money. How can every major insurance company make a similar statement? They can’t all be cheaper than every other company, on average.
As a particularly egregious example, Allstate’s website declares it via [...]
Ever wondered how song-identifying iPhone app Shazam works?
Now you know.
(For the link-averse: it’s a pretty cool implementation of pattern matching across song spectograms, and the key insight was to first reduce the spectograms by including only peak frequencies. Simple, yet genius.)
(via Revolutions)
ReadWriteWeb’s coverage of a new study on webmail demographics contains one sentence that left me a little confused:
Gmail, for instance, includes more females (53%) than males (47%). If those were election poll results, we would call it “too close to call,” but in terms of tens of thousands of users, these percentage point differences have [...]
I’ve covered Benford’s method for first-digit fraud analysis before, and now Nate Silver has applied a similar method to polling results. He looked at the last digit of various polls (i.e. a 48% McCain, 49% Obama, 3% undecided poll would be recorded as an 8 and a 9) and compiled histograms of their frequencies. Following [...]
Once again, the self-proclaimed “experts” of social media are revealed to be not much more than some anecdotes and a keyboard. The latest is Dan Zarrella, who has written a vitriolic attack on Twitter’s planned adoption of the retweet as an official mechanism. Zarrella does some excellent work in other areas, but I find him completely [...]
I don’t usually have anything nice to say about Twitter (though I still ignore my mother’s advice and say it anyway), but the company is finally taking steps to improve one of the most glaring faults with their service: retweets.
Previously, retweets were simply new tweets that happened to contain old information. This created clutter and [...]
Finally, a radial visualization which serves a purpose rather than just looking cool. Getting Genetics Done has a tutorial on using clustering functions in R. In it, they show how this this analysis:
is much better represented like this:
There’s nothing wrong with making a chart which looks good – in fact it’s encouraged - so long as the visual [...]
I read this morning about the drama at last night’s MTV video awards (does anyone actually watch this stuff?), but the episode was overshadowed in my mind by a quirky accident of rankings: if Taylor Swift beat Beyonce for the “Best Female Video”, how can Beyonce go on to win “Video of the Year”? Presumably, video [...]
BMW is actively researching the use of augmented reality for servicing cars:
Augmented reality (AR) has been getting a lot of press for recent advancements on the iPhone and Android platforms. While it’s nice to see these developments, thus far I’ve thought the excitement is a bit premature. It’s as if we all know how amazing [...]
Maybe there’s something in the water today – no sooner had I finished estimating the Earth’s solar radiation than this popped up on Cool Infographics:
The map was created by the Land Art Generator Initiative to show the amount of solar panel coverage required to power the Earth for one year. Very interesting, and this has [...]