Dat is a way of synchronizing data between two or more sources, tracking any changes to that data... and some
Mo Data stashed this in Big Data Preparation
Let’s say your city releases a list of all trees planted on its public property. It would be a godsend—at least in theory. You could filter the data into a list of all the fruit and nut trees in the city, transfer it into an online database, and create a smartphone app that helps anyone find free food.
Such is promise of “open data”—the massive troves of public information our governments now post to the net. The hope is that, if governments share enough of this data with the world at large, hackers and entrepreneurs will find a way of putting it to good use. But although so much of this government data is now available, the revolution hasn’t exactly happened.
In far too many cases, the data just sits there on a computer server, unseen and unused. Sometimes, no one knows about the data, or no one knows what to do with it. Other times, the data is just too hard to work with. If you’re building that free food app, how do you update your database when the government releases a new version of the spreadsheet? And if you let people report corrections to the data, how do you contribute that data back to the city?
These are the sorts of problems that obsess 25-year-old software developer Max Ogden, and they’re the reason he built Dat, a new piece of open source software that seeks to restart the open data revolution. Basically, Dat is a way of synchronizing data between two or more sources, tracking any changes to that data, and handling transformations from one data format to another. The aim is a simple one: Ogden wants to make it easier for governments to share their data with a world of software developers.
That’s just the sort of thing that government agencies are looking for, says Waldo Jaquith, the director of US Open Data Institute, the non-profit that is now hosting Dat. “We get calls every week from federal agencies asking how they should give people access to their two petabyte data sets,” he says, referring to information troves that span millions of gigabytes. “They want people to give people slices of that data, but they don’t know how to do it.” Dat can step into the breach—and potentially bootstrap a whole new world on online applications.
Years in the MakingOgden has been working on open-data problems since he was 19. In 2009, he started experimenting with open data sets from the City of Portland, Oregon, trying to build applications such as a map of all the bike racks in the city. But he quickly found that the city’s data wasn’t as open as he would have liked. It was often in formats that would require expensive enterprise software to work with, so he started translating the data sets into more open formats. That culminated in Ogden creating PDX API, a custom version of the open source database system CouchDBloaded up with many data sets from the City of Portland to make them easier for developers to work with.
His work in Portland landed him a fellowship with Code for America in 2010, a non-profit dedicated to helping the public sector make better use of technology. As part of the fellowship, he worked with the city of Boston, where saw that many cities were struggling with the same issues he faced in Portland. That gave him the idea for DataCouch, an attempt to make it easy for any government or organization to build their own system along the same lines as PDX API. But he ran into some problems. CouchDB was a full-blown database management system, which was a bit too heavy for what Ogden wanted to do. The project never really took off. The tools he need just weren’t there.
After his year-long fellowship, Ogden took a detour into consumer mobile apps and 3D graphics, but he was haunted by the idea of DataCouch. “The open data thing seemed like the place where I could make a bigger impact,” he says. He started thinking about where things went wrong with his ambitious project, and how he could do things differently if he started over from scratch.
What Dat IsOgden’s original inspiration for DataCouch was GitHub, the popular code-hosting and collaboration service. Using GitHub, developers can copy open-source projects so that they can make their own versions, known as forks, and submit those changes for approval by the original developers. He wanted to inspire a similar spirit in data, enabling developers to copy and modify data sets, and submit changes back to the government. But he realized that he was missing a big part of what makes GitHub work: Git itself.
Git is a piece of software originally written by Linux creator Linus Torvalds. It keeps track of code changes and makes it easier to integrate code submissions from outside developers. Ogden realized what developers needed wasn’t a GitHub for data, but a Git for data. And that’s what Dat is.
Dat to the FutureOgden built a prototype of Dat with funding from the Knight Foundation, a non-profit dedicated mainly to media and journalism initiatives, and is now an employee of the U.S. Open Data Institute. Most of the project’s development is currently funded by the Alfred P. Sloan Foundation, a non-profit founded by the former General Motors CEO to fund scientific education and research.
Although Ogden’s background is in city government, the Dat team is now squarely focused on the needs of scientists. That’s largely because of the Sloan Foundation’s focus. “I don’t come from a scientific background and wasn’t even thinking about science data,” he says. “But they convinced me that I should.” He explains that scientists have to deal with many of the same issues with formats and tracking changes that city governments do. Using Dat, Ogden says, much of this complexity could be abstracted away, at least for some users of the data.
The new round of funding has allowed Ogden to actually hire other developers to help build the tool. But he’s not sure how long it will last. That means he might eventually have to try his hand at running a startup again, but he says it might be possible to just keep raising funds indefinitely.
Of course, Ogden still thinks that Dat will be useful to governments. And he’s not discouraged by how slow going the open data movement has been so far. “We’re not trying to take over the government right away, just put out tools for the early adopters, and create a new generation of open data people,” he says. “It takes a long time.”