Only human writers can distill a heap of sports statistics into a compelling story. Or so we human writers like to think.
StatSheet, a Durham, NC, company that serves up sports statistics in monster-size portions, thinks otherwise. The company, with nine employees, is working to endow software with the ability to turn game statistics into articles about college basketball games.
Established in 2007, StatSheet.com provides statistical analysis of college football and basketball, Nascar and other sports. It dices data in more ways than any fan could possibly absorb. But charts, graphs and rankings alone cannot replace words that tell a story. We humans love stories; a craving for narrative seems part of our nature.
This month, StatSheet unveiled StatSheet Network, made up of separate Web sites for each of the 345 N.C.A.A. Division I men's basketball teams. Beyond statistics galore, each site has what the company calls "automated content," stories written entirely by software, including write-ups of the team's games, past and future. With a joking wink, StatSheet's founder, Robbie Allen, refers to these sites as the "Robot Army."
Each team's StatSheet website is located at a freestanding Web address, conveying the sense that it is wholly invested in the interests of that school's fans.
The software is imbued with the smarts to flatter each particular team. The same statistics, documenting the same game, produce an entirely different write-up and headline at the opposing team's page.
A team like No. 1-ranked Duke -- whose StatSheet Network Web site is at BlueDevilDaily.com -- does not lack for attention from human sports writers. But StatSheet expects that the sports programs of smaller schools will appreciate the advent of robot journalism.
"There are at least 200 Division I schools that the large sports media companies give no attention to," says Mr. Allen at StatSheet. "Once we have the algorithm in place, there's no cost to adding the Lamars and Elons to the Dukes and UNC's."
Small schools are less likely to have large alumni bases and to draw significant traffic, Mr. Allen said, so he is knocking on their doors to explore licensing partnerships.
Mr. Allen explains that his story-writing software does not perform linguistic analysis; it just uses template sentences and a database of phrases that numbers about 5,000 for now.
"My goal was that 80 percent of readers wouldn't question that the content was written by a human," he says, "and now that we've launched, I think the percentage is higher."
In the battle between human writers and robots, I'm not a dispassionate observer: I own up to rooting for the home team. So I asked Michael W. White, an assistant professor of linguistics at Ohio State and a specialist in the field of natural language generation, for his professional opinion of the quality of the robot army's writing.
We visited the StatSheet site for Ohio State's team -- BuckeyesBeat.com -- and looked at the write-up of its November 12 season opener, written expressly for Buckeyes fans: "Ohio State Gets 102-61 Monster Win Over North Carolina A&T." "Ohio State has already started living up to monumental expectations with a good first game," it began. "On November 12th on their home court, the Buckeyes waxed the Aggies, 102-61."
The story had 10 sentences and 156 words. Over all, Professor White said, it read "pretty well." He praised the first sentence as very good and said the use of "waxed" in the second was a nice touch.
Then he pointed out some glitches. In one passage, the software lost track of who had beaten whom and forgot to use "the" when referring to Ohio State's supposed victory "over Buckeyes." A note that followed the write-up was a bit overeager to show that Ohio State was undefeated so far this season, when its record was still just 1-0.
The Statsheet Network's Web sites are plainly labeled "beta," and these minor bugs can easily be eliminated with tweaks. The bigger problem, the professor said, is that StatSheet's software lacks the awareness of linguistic structures that would let it generate more complex sentences: "Getting from remarkably good to human level is very hard. The more you cram into a sentence, the more difficult it is to do it well."
The StatSheet software avoids those difficulties by using simple sentences and swapping in particular details. "This makes it perfectly readable, but it also can make it seem stilted," he said.
Mr. Allen of StatSheet says he believes that what some readers regard as "stilted" will be appreciated by others who say 'I don't like personality -- I just want the straight facts'.
He sees opportunities to extend robo-journalism not only to other sports but also to other subject areas -- like financial news -- in which there is an abundance of readily available data.
Will StatSheet's robo journalists replace business columnists? Not a chance: we humans are undefeated.
Only problem is, just like the Buckeyes two weeks ago, we're just 1-0.
For the latest tech news and reviews, follow Gadgets 360 on Twitter, Facebook, and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel.