From left, Michael Chui, Alex Alsup, David Behen, Joel Gurin, and Tony Scott
David Behen of Michigan's Department of Technology, Management, and Budget
VMware's Tony Scott
Alex Alsup of LOVELAND Technologies
Joel Gurin of NYU's Governance Lab
Michael Chui of McKinsey & Company
Chief Product Officer, LOVELAND Technologies
Chief Information Officer, State of Michigan
Senior Advisor, The Governance Lab, New York University
Chief Information Officer, VMware Inc.
Partner, McKinsey Global Institute, McKinsey & Company
The digitization of everything and access to vast databases of information means we can measure more of the world. It also means we can change what we measure—about city functions, services, and the economy. New opportunities are emerging to cultivate innovation, build new services, and unlock economic value. Agencies across the country are opening doors to a data treasure trove hidden for years.
Kirkpatrick: Data is one of those things that we can’t get away from if you talk about technology’s impact on society. So this next session is going to really grapple with that, not only from the standpoint of how it’s transforming business, but how it’s transforming government, organizations more broadly. And we have, to lead the session, Michael Chui of the McKinsey Global Institute, who really understands how data, and open data in particular, is transforming the way that organizations, and even countries can operate.
Chui: I’m joined by a terrific panel here. We have, to my left, Alex Alsup, who’s the chief product officer at LOVELAND Technologies; David Behen, who’s the chief information officer of the State of Michigan—small job; Joel Gurin, who’s the senior advisor of the gov lab at NYU; and then Tony Scott, who’s the CIO at VMware. So thanks so much, all of you, for taking some time to talk about something which is important, which is data. We’ve done some research with data at the McKinsey Global Institute, big data, open data. We found that open data, for instance, could create over $3 trillion dollars of value globally on an annual basis, which is a big number.
But why don’t we turn first to our panel. So you’re all from very different types of organizations, all of which touch data, and open data specifically. But open data is sort of an open topic in terms of how you define it. So I’d love to hear from each of you, when you think about open data, what is it that you think about, and then how is it that you all are actually taking advantage of it and making it useful for other people. So, Alex, let’s start with you.
Alsup: Sure. Well, thanks for having me here. Excited to be here. Let’s see, when I think of open data, what do I think of? I think of parcel maps, because LOVELAND Technologies, we take property data, take the legal boundaries of every property in the city and put it online for people to explore. And I think of not just making government and public data open, having polices that make it available, but making it usable. There’s a lot of data that is sort of publicly available, but not publicly useful. So when I think of open data, I think of making that information helpful and—
Chui: What does it mean to be useful?
Alsup: Let’s see, for instance, being able to understand—say in Detroit, being able to understand, beyond what you owe in property taxes, what the picture of property taxes are around you. So taking a very specific instance of data that may be relevant to you and making it publicly available so you can see the context of that information on a citywide scale. That would be one instance for me.
Chui: So visualizing it—
Alsup: Exactly. Yes.
Chui: Okay. David?
Behen: Yes, first of all, it’s great to be here. I was born and raised inner city Detroit, and it’s just great to see the city of Detroit having such a strong rebound. And so I’m thrilled to be part of that, and so it’s great to be back in Detroit. So let’s give it up for Detroit.
Always have to give a shoutout for Detroit. But, so for us at the State of Michigan, we’re really looking at, we call it customer-centric government: how do we bring the services to our citizens, instead of our citizens having to come to the service, like, you know, kind of the old school government, where you had to go to a building and get your services. We believe that data will fundamentally change the way we do business. So this customer-centric model and having the data accessible to our citizens to make our services seamless to them, that’s how we look at how we’re going to use our data. Not just open data—and we have a ways to go at the State of Michigan around open data, but we have Michigan.gov/opendata, where you can go and look at the datasets we have out there—but how are we going to use all the data that the State of Michigan has to fundamentally change the way we provide customer service? We do it every day in our private lives, right? So how can we use that data in a very positive way to really change the way we provide service? And that’s what big data and open data mean to me.
Chui: So we probably have a relatively unique crowd in here, you know, those of us who will actually want to go to the open data portal and download a bunch of data and munge it up and all that. But that’s probably not true of most Michiganders, right? So how do you think about, you make data available, but then you talked about transforming citizen services. How do you connect those two?
Behen: Couple ways, right? One would be we have Code Michigan, where we take all of our open data, we put it out for people, and we have a weekend kind of hackathon, where that is actually—this is my public service announcement message now: That is actually October 3rd, 4th, and 5th in three locations in the State of Michigan. So we invite you all to attend and participate. Actually, last year, one of the winners from Code Michigan was LOVELAND Technologies.
So we try to promote that. We push that out there, have those opportunities for citizens to get involved in that. We talk about it all the time. We talk about how government needs to be more transparent and how we need to be accountable for the things we do. So we put things out there, like we push people to our website, we talk about it all the time. We have a transparency site that shows how we’re spending the dollars. You know, those kinds of things. And it’s always something that we’re pushing, because we know we need to get better at it, and the only way we can get better at it is by having dialogue with folks on it.
Chui: Great. Joel?
Gurin: Well, I think open data means a lot of things, but the part that we’re focusing on at the gov lab at NYU is the intersection really of government, private sector, and civil society. So in a sense, it’s transparency plus. When you think about open government data, I think a lot of it is about government accountability, transparency, really helping citizens understand how their city or state or federal government works. But we really are looking at lot at how open data becomes a public asset and a public resource that can do all kinds of things.
I run a study at NYU called the Open Data 500, where we’ve been identifying 500, actually, now closer to 600 US-based companies that use open government data as a key business driver. And I would say two things are really striking about what we’ve found so far in this context. One is that these companies are showing up in every sector of the economy: energy, education, healthcare, finance. We have about 15 categories. So this is not just an asset that can be used in a couple of specific ways. It really has economy-wide implications.
And the other thing is that we’re finding, interestingly, that a number of these for profit companies actually also have a civic mission, and that they’re using data in ways that not only is profitable to them, but that actually can help citizens or a city infrastructure. So just to take one very well-known example, NextBus, which began a couple of years ago and is now all over the country, very well known for using municipal transportation data to help commuters figure out when the next bus is coming. And on one level, that’s simply making a service easier to use. But if you look at their mission statement, they’re very much about improving public transportation and making public transportation a more attractive alternative to people than driving their car, which helps the environment, it helps avoid traffic congestion, etcetera, etcetera. And over and over again, we’re seeing companies that are starting, not only with a profit motive, but with a civic mission, where civil society and the private sector and government really do come together around open data.
Chui: Must be a reason why a lot of people are here.
Chui: Tony, you’re at one of the leading technology companies in the world. How do you think about open data and how does it create value?
Scott: Well, I think it’s a pretty powerful force, actually, used correctly. And then you also worry about the flipside, you know, some of the inherent risks or dangers, from either bad data itself, or corrupt data, or else data that’s inaccurate. And the provenance of data, where it came from, is it a trustworthy source, all those things become pretty important.
The other thing that I’ve observed, that I’m probably getting more and more sensitive to in the current role, is the world is really moving from a sort of—there’s a new model for how things are being linked together. And we see it in social media all the time, but we’re really becoming a network of things. You know, we think of, for example, the terror network, or we think of the airline network, or the communications network. Industries are linking together, cities and government institutions are linking together and so on. And I think one of the biggest challenges with the whole open data movement is how do we really take advantage of these ecosystems of organizations and link their open data together in more powerful ways.
I’ll give you one example. I’m on the committee for our company that plans real estate and workplace. We’re the group that decides where we should put facilities. We’re also the group that worries about disaster recovery and business continuity in the event of some sort of a disaster. And today’s model requires that I tap into databases of information created by, in some cases, relatively small institutions, and there’s no place I can go where that’s really aggregated together in a way that becomes useful, so I still have to do a bunch of work, and just one example where an ecosystem could probably help.
Chui: Yes. Well, Tony mentioned risks, and, Alex, I know that in prior conversations you’ve talked about sometimes open data can cause discomfort—discomfort on the part of government officials or others. What are some of the things that actually become a little bit challenging, and how do you think about overcoming them?
Alsup: Sure. Well, I mean I think the question of data integrity is really important. One thing I like to say about open data is that there’s kind of this notion that people carry around that data is gold. Data is not gold, it’s more like ore. It’s crude, unrefined sludge that you need to put through a laborious process to make usable, make useful, build an interface around that can present and exploit the value of data.
One of the things that we encounter all the time is government departments who are concerned about the integrity of their own data that’s been kind of siloed away and not cross-referenced with any other city or state or federal datasets in a long time, and they worry about making it publicly available.
Chui: What do you mean by integrity? What are the things that people are concerned about?
Alsup: Just that the data’s wrong, that it doesn’t match reality anymore, that the person who the assessor says lives at that address no longer lives at the address, that the assessment rate is wrong, that the tax rate is wrong. All of that is possible. But there’s no higher authority than the city or the state, whoever holds the data. They are the god of that data, and if god is flawed, the only thing you can do is put it out to the public and solicit them to tell you that there’s something wrong with it. What’s really important then—and I think this is where we need to formalize relationships; this is the reason to formalize relationships—is that when a company like LOVELAND puts property data out for people to explore and people start coming to us saying, “Hey, this is wrong. I know what the real truth is here,” we then need a formal relationship with the relevant department to update that data inside of their system, as well as ours, so that we don’t get—
Chui: They don’t just—
Alsup: Exactly, right. “I look at LOVELAND for my property tax information, not the city.” No, you need to make sure that there’s version control, that you’re feeding back information and you’re building up that muscle memory between private companies that are collecting this information, and government, which ultimately has to be the keeper of that information.
Gurin: I think that whole question of how do you improve the data is really central. I mean we think a lot of open data as like the two steps are release the data, use the data, but clean and improve the data is really in the middle there. A lot of the work that we’re doing at the gov lab is now with federal agencies, and what we’ve done with the Open Data 500 study is we’ve used that as a basis for roundtables, where we bring together data providers and data users for a real structured dialogue about which of your datasets are really working for us, which ones are not. And I think there’s a potential to do much more than just put data out there and have sort of a comments section or a feedback loop where an individual can say, “This particular thing is not really right.” I think there’s a lot of opportunities for really doing structured programmatic interactions between—again, between civil society, the private sector, and government to improve these datasets. And our experience on the federal level is that these agencies would love that. I mean what you have is really smart, really committed tech people taking over the data functions in agencies where these horrible kludgy systems have sort of built by accretion over 30 years.
Gurin: Sludge. There’s a high sludge factor. So really, everybody’s kind of on the same side here, and I think what we need is a kind of approach that sort of takes the people that are now responsible for old data off the hook, so that somebody in a government agency can come out and say, “Yes, I know we have problems here. Why don’t we figure out how you can help us fix this.”
Scott: How do you fix mistakes, though? I ran across one example recently—maybe you guys have a solve for this, but property lines used to be a surveyor going out and putting a stake in the ground and having a measurement and a distance, and there were these old paper maps and so on. They sometimes disagree. I can imagine a world in the future where there’s no map anymore, it’s just data in a database that says it’s 40 feet long, and it could be wrong, and where do you go adjudicate that sort of thing?
Behen: Yes, it’s interesting, never been referred to as the god of data. That’s pretty cool. I’m going to change my business card. But one of the things that Michigan—because, all great points, right? And I wouldn’t call them all old and stodgy. Is that what you called all the data systems in government? They’re not that bad yet, right?
Gurin: Oh, no, just kludgy.
Behen: Right, kludgy. Let me say this. In Michigan, what we’re doing, because we want to address the issues you all just brought up, is we have an enterprise information management program. The governor did an executive directive to all the agencies that, “You have to share data.” Now, to your point about the silos, we had silos. I mean there wasn’t a month that didn’t go by where someone wasn’t emailing me saying, “I need you to talk to so-and-so so they can share the data.” Well, quite frankly, we like to tell folks, it’s not their data. It’s the State of Michigan’s data, right? So how do we work together at State of Michigan to look at that data, make sure it’s the correct data—
Chui: You mean employees, you’re telling that it’s not the employees’ data?
Behen: Yes, employees. And I don’t blame them, right? They just feel that like they’re the good stewards of public data. And we tell them we’re not going to break federal law, state law, we’re not going to break IRS rules or guidelines with this data, but we need to use that data. We need to make sure the data’s correct, right? And then we need to be able to use the data in a very positive way. And so I think that’s what you’re seeing across the State of Michigan. And we’re trying to—the city of Jackson, Michigan, from what I understand, is a leader in data. We’re working with the city of Detroit on their data. So how do we make sure that all these organizations are working together to make sure that the data’s correct? I mean because there’s issues. Like in the State of Michigan, some of the issues are how do you define—what is the definition of a data point, right? What is an address? Well, you’d be amazed at how many different areas think an address is something different.
So it’s a long process, but it’s a process that we’re engaging our private sector partners in, our educational partners, and our local government partners. And, again, it’s a marathon, so it’s going to take some time, but eventually we’ll get there. Because the data is—what did you call it? It’s iron ore, right? It’s ore. There’s a lot of things that need to happen to it to make it very useful. And I think the great work you guys are doing, you know, it’s very helpful, because we need that kind of think tank to come in and assist us with that. So it’s good stuff.
Chui: It’s easy to think about data from the geeky standpoint, the miners who are going to do it. But how important is leadership, you know, leadership from the governor’s office, when you’re talking about making data open?
Behen: Well, and I’ll tell you, you know, Governor Snyder’s made this a priority for him. He’s the one who put out the executive directory to say, “You will share the data. We will bring the data together.” All the agency directors have all agreed to it, are on our steering committee, our governing body. When you have the director of the Department of Human Services, Maura Corrigan, pounding on us to make sure that we can use that data in a positive way, to find the cracks in the system, where somebody who may be less fortunate than others, who may be on a free lunch program, how do we make sure that that family’s not missing other programs? And it’s by using that data and making sure that data cross matches so we can find those programs. But the data’s got to be right. So the leadership piece, you know, having the executive sponsorship in these programs is critical. If you don’t have that, I think it’s an uphill battle.
Chui: So we’ve talked a little bit about the supply side, how do you get the data out there, how do you make it open, what are some of the challenges. I’d love to talk a little bit about the demand side, people who are using data. And Tony made reference to an ecosystem, right? And so, Joel, I know that Open Data 500, you’ve actually been looking at a lot of the uses of this data, not only for accountability and transparency, but also for creating businesses, and creating jobs and economic activity. Can you talk a little bit more about the things that you’re seeing there and how does data become an ore that gets mined and then actually creates—
Gurin: Sure. Well, I will say, it’s all on OpenData500.com. So I will not go one through 500 here, but I’ll give you a couple of the examples. I think some of the ones that are most relevant to cities, which in some cases use city data around the country, and in other cases use federal data as well, we’re seeing a number of different kinds of application. So one I would call really sort of civic engagement and accountability and municipal management. So we have companies like OpenGov.com or Govini, which are taking city data, which can be very, very different, you know, between Palo Alto and San Mateo, and putting them on common platforms, which helps town hall meetings, and it also helps city managers, for example, say, you know, “We have a level of police overtime here that’s twice as high as our neighboring city. Why is that? What can we learn from that?” So those become very important dashboards, I think, for city management.
SpotCrime I think is a very interesting application that is really mapping crime levels and types of crime around the country, again, both for city comparability and for city management. We’re seeing a number of applications in the transportation sector. I mentioned NextBus, but there are a number that are looking at traffic patterns and traffic flow and private transportation as well. And we’re seeing a number of interesting applications—New York City, for example, just did this BigApps competition that I was an evaluator for, and you’re seeing people come up with apps now for things like employment, not only new ways to link employers and employees, but things like looking at trends of employment in the city, what’s likely to be a hot industry. A new app called RentHacker, where it uses crowdsourcing to supplement city open data and really help people understand what is that apartment really going to cost, and how much are you really going to have to pay off the super to let you live there, and is it a good deal and so on. And a lot that are emerging around emergency services, and a number that are evolving around small businesses and really helping accelerate small businesses, like OnDeck—it used to be called OnDeck Capital—that is using open data to help small businesses get loans by doing the due diligence that the major lenders don’t have time to do. So they can sort of connect small businesses with lenders and say, “We’ve already vetted this guy. This is why you should lend the money.” So all of these different applications, as I said, all different parts of the economy, and many of these really with the potential to improve city services and improve the way cities work, while at the same time, building a viable business off of city data.
Chui: So you mentioned how this can aid small entrepreneurial businesses. You mentioned some larger ones. But, Tony, a lot of your customers are the biggest enterprises in the world, and we’ve found in our research, when you combine data from multiple sources, both internal as well as external, you can often create more value. How does a large enterprise think about it? I mean here in Detroit, we have big businesses as well as smaller ones.
Scott: I think it’s certainly a data age, there’s no question about it. Some may know, I worked at Microsoft as CIO for about five years, and I was onstage with Steve Ballmer on a panel like this, interviewing him, and he said, “You know, it may come as a surprise to you, Tony,” and he put his hand on my shoulder, and he said, “Most of the data I use in my job doesn’t come from your internal systems.” You know, and he thought he was shocking me, maybe, somehow. And it didn’t surprise me at all, because, you know, if we think about just what we all do in our daily lives, if you want to know the stock price of your company or something you’ve invested in, you don’t go to that company’s website, you go to data that comes from somewhere else. If you’re at dinner and you want to look up something, you don’t tap into your own private database, you go to Google or Bing or someplace like that. And companies I think do the same thing, whether it’s researching new products, finding new cures in the field of medicine and so on. The reliance is on huge amounts of public and open data and shared research data. If we’re looking for a new location, that’s certainly the source that we go to, and then we ingest it and do different things with it that probably you would never have done five years ago, or been able to do.
Here’s one just simple example of how much things have changed. Years ago, I was working at a pharmaceutical company, and it took us—we would get data about the sale of the particular medicines once a month. It took almost a month to crunch the data, and then it took another month for people to sort of analyze it and figure out what was going on. In a three-year period, that went from a month, in a month, in a month, to the data would come in two days after the month was over, it would take a day to crunch the data, and it still took a month for people to analyze what was going on—people being the long pole in the tent there. But the advantage that gave the company—and this was all data that could be shared among a variety of pharmaceutical companies—was much greater information about what doctors were prescribing, what they were prescribing it for, that aided in the formulation of new drugs. And the whole lifecycle of business in every industry is sped up that way every couple of years. So the half-life is much shorter now, and I think that’s the biggest boon to business is that we’re faster and there’s a lot more data that’s available to do even better things.
And I would just conclude on that point by saying, like one of the first panelists here, I am really optimistic about the economic opportunities that data, and the availability of it and the tools for using it will provide. I think it’s created the greatest economic opportunity of anything that we’ve seen in the last hundred years, probably. So I’m pretty excited about it.
Chui: Terrific. We’re going to come to questions from the audience as well, so there are microphones here. But let me also ask, Tony, as you mentioned, the technology’s advanced, we’re seeing data come, you know, volume, and then it’s coming faster cycle times. But that also raises the bar in terms of the skills necessary in order to deal with open data and big data, etcetera. Alex, can you talk a little bit about what that means in terms of being a user of the data itself?
Alsup: Sure. I mean I think the challenge for companies like us is to build tools that allow for the exploration of this information. I read something recently that I really liked. You know, everybody talks about intuitiveness and, you know, technology systems and apps that are easy to understand at first glance, and I read something that was kind of pushing back on that and saying that you want to make platforms and tools that make challenges to the user visible, but allow them to explore them on their own, to engage with difficult—not difficult, but challenging parts of an interface or a system or an app, whatever, and that ultimately, the user, by struggling through multiple steps, can come to a much deeper understanding and get greater utility out of what you’re building. Which I think really informs what we do at LOVELAND is take really kind of complicated and archaic things, like property taxes and parcel boundaries, and try to build tools that enable exploration of the data and allow users to use the system to their own ends, not try to define a single point that they should get to and make it as easy as possible for them to get there, but take them somewhere interesting that goes where they want to go.
Chui: Life is struggle, so is data.
Behen: You know, one of the things that we’re fortunate about—and I’m state government, right? So we’re fortunate—we like to look at the power of the crowd, as you were touching on earlier. Quite frankly, there’s an app being built right now—you know, we have an app called MyPage, which you download and it’s our first step towards that vison of customer-centric government, pushing the services there to you and you can do everything on your smart device. But one of the apps we were thinking about doing was a trails app. Michigan has great trails. I think you’ve all used them. But you know what? We’re probably not the best ones to make that trail app. DNR has a lot of data to build that app, so we might build an app and put it out there, and then keep it open so that the crowd can come in and make changes to that app, make that app better, you know, strike up some of the entrepreneurship that you’re talking about. Because I do believe the open data piece, if you use the power of the crowd, I think you get a better product. And also, I think it does inspire and breed entrepreneurship. I mean I started a company off Grants.gov. I hope it’s in the New York 500. We’ll see.
Gurin: I’ll check.
Behen: We’ll check it out, right?
Gurin: Yes. No, and I just want to underscore that point, because I think that’s true at every level of government data, that we shouldn’t be thinking of open data as just government data. The potential to enhance it and transform it, both with data provided from crowdsourcing, with some experiments that have been done by some federal agencies of using crowdsourcing to actually improve the quality of data, plus data from the private sector—I mean if we really look at developing that ecosystem, it becomes very, very powerful, with government as a kind of central data source and as a driver, but in a way that builds something much more powerful.
Chui: What we’ve talked about also is liquid data. That’s when sludge becomes even better.
Suresh: My name is Shanti Suresh and I’m the CIO and co-founder of Saras LLC. We are a technology and science company based in Ann Arbor, Michigan. Recently, I was looking up to find out players in, for example, the solar energy area. And there’s a lot of good information in the SBDC—I forget the online database I was looking at. But it was hard for me to come up with a specific set of players locally in that area. So thank you for putting the data out, but as Tony was saying, and all of you were saying, linking the data, classifying it needs to be improved somewhat. For example, when a company’s registering themselves, maybe have them tag their own company in a few different areas right when they’re registering, and that way we can play well together when we need to connect with others. I don’t know, how important is classification? I think searching is so important. We depend on Google for everything. But you are clearly their competitors now, so do you have any efforts going in classifying? I guess that’s part of the sanitizing of the data, isn’t it?
Chui: David, maybe it’s a state registration of business—
Behen: Why does it always have to be the state? No, I’m just kidding. Yes, it could be. I mean some of those things, you know, it’s how do we put that process—I mean there’s got to be a process around the data too, so how do we put that in there so we can start tagging that data and then exposing it so people can utilize it.
Overholser: Hi, I’m Michelle Overholser. I have a question about the accessibility of the data. When we’re thinking about private corporations, they have no inherent responsibility to the entire population to make their data available. It’s the people who choose to download the apps, it’s the nerds in this room who go to the open data website. But when it comes to the government recognizing the importance of this data and the fact that people don’t necessarily go to a building anymore, they do expect to have this information available online, I think it’s really important that the government does catch up to that understanding. I notice here in Detroit, if you get a parking ticket, they don’t put the website on that ticket. Twenty-thousand Detroiters, at least, are facing foreclosure in the Wayne County Auction, and that entire auction takes place online, and none of them know the website, nor do they know it even happens online, and perhaps they don’t even have access to internet. So these are really important questions, as open data and data becomes more available, to make sure that it’s not becoming discriminatory to those who don’t have access to internet.
Behen: Yes. Again, it is a challenge. That piece is a challenge. And, you know, I think we need to become—all government, actually, all industry needs to become better at showing where their data’s at, advertising that, pushing it out to all different segments of the community. I mean what we don’t want to do is start another digital divide, right? We still have some of that, obviously, right?
And the other piece to that too is—you brought up the private sector companies and some of the public sector companies and the data. One of the things I talk about all the time when I talk about our data is that state governments, local government, we have to be great stewards of that data, and that means that we need to make sure that the information that needs to be kept private is kept private. And so all of our cybersecurity measures have to be as best as we can do it. That doesn’t mean, though, we shouldn’t be advertising more, putting our URL wherever we can, and making sure that it’s accessible to everybody. You know, we still have work to do around the ADA, Americans with Disabilities Act, on websites and things like that. We still need to get better at that. And that’s a dialogue we like having with the community. We like having people like you come up and challenge us on how do we get better here. We always like people to say too, “You’re doing pretty well here, but we’d like to get better here.”
Henker: Hi, may name is Gillian Henker. I’m an entrepreneur, and I’m a founder of Sisu Global Health. And I was wondering what your thoughts are about using open data in healthcare specifically, because you just mentioned about privacy and things like that, but also, all the benefits that can come from knowing this data. So it’s kind of an issue of privacy, but also kind of reaping those benefits.
Gurin: Healthcare I think is probably the area more than any other where open data may be really, really disruptive. And it’s such an interesting area, because it involves individual patient data, which is HIPAA protected and really needs to be. It involves open data about things like Center for Medicare and Medicaid Services releasing more information about the actual Medicare charges and different providers and so on. And it involves the attempt to figure out how to take all that private patient data and anonymize it so that it can become a dataset that has enough granular information in it that you can do really interesting analyses of, for example, the relationship between different treatments and different outcomes, but at the same time, is anonymized to a degree where you can’t identify who these individual people are. And I’m not a technologist, but just from what I hear from people, it seems to me that that particular question of whether or not you can create large, anonymous, but very useful datasets is still being very actively debated and is probably more important in healthcare than anywhere else.
So I would say if that problem is solved to a degree that we can use all this kind of data in a way that people are comfortable is anonymous and secure, it’s going to open up huge opportunities. If that problem is not solved and everybody knows it’s not solved, there’s still a lot of opportunity in healthcare, but not as much. And if people think they’ve solved that problem and then somebody cracks the code and suddenly releases a ton of private patient records, it will not be pretty. So I think there are a number of different possible paths forward for healthcare data, and I think a lot of them really rely on that technological and social question.
Behen: There is hope there. There is hope. I know of one example—and I don’t know all the details, but I can give you the information for someone who might—where someone used open health data to look at asthma in kids, and then you put a chip on an inhaler and that’s how they gather all that public information, right? And I think it was actually one of the White House’s top 100 healthcare innovations in the past year or something.
Gurin: It’s Propeller Health, I think. That’s a great example. And actually, one of the solutions to this is where people are voluntarily sharing information, and you are seeing a lot more of that, even on sites like PatientsLikeMe, where people share their experiences. I think that’s right, I think that actually is a very, very powerful potential source of open data that doesn’t have those kinds of privacy concerns, yes.
Behen: Yes, so I think it’s coming, but it might be difficult.
Scott: I want to endorse one particular notion. I’m on the board of directors of the City of Hope, one of the national cancer centers, and it turns out there’s no better data than having information over a long period of time about health and the things that lead up to health conditions and so on. The idea’s been put forth, just like you can be an organ donor on your driver’s license, to volunteer your data, anonymized, so it can’t be traced back to you, but every time you go to the doctor, put that in a database, where the history of large groups of people could be tracked over some period of time to look for these long-term kinds of things. And I love that idea. It has to be done the right way and so on, but we all don’t just get sick one day. There’s a whole bunch of things that lead up to it, including, you know, your DNA data and all of the other things that go with it. So if you can promote that idea, I think it would help society as a whole.
Chui: Our research found that the use of open data could create almost over $300 billion dollars of annual value in US healthcare alone.
We’ve got to wrap up, but just one final question for everybody, very quickly. What’s the most exciting application of open data that has not yet been exploited? Alex?
Alsup: Oh, boy. Let’s see, most exciting? I think that there are—most interesting. I won’t say exciting. It’s a little scary. But one of the interesting implications, as government pushes more and more data into the public realm, comes to me from the similarities, or the potential for similarities that we see with something like the NSA’s PRISM surveillance program. I think PRISM, to some extent, was enabled by the fact that the NSA was sopping up data from private companies whose commercial success depended on organizing their users’ data very well, making it indexible, searchable, usable. As more companies rush into the field of organizing, indexing, sorting, making usable government data, I wonder if there’s a possibility for a kind of corollary, you know, people’s PRISM, which is actually using government data to triangulate and understand, deduce the activities, secret activities potentially, of government, using data that in and of itself in a silo doesn’t seem to have any value, but when triangulated with other data sources, in fact can shed a lot of light on what’s going on.
Chui: People’s PRISM, okay.
Behen: What he said. No, real quick, you know, I really look at what I said earlier, how can we take the data, use it in a very positive way, and have a customer-centric government, where our government services are going to our citizens in an easy, effective and efficient way.
Gurin: I would say it’s actually an area we haven’t talked about. It’s open science. It’s getting to a point where scientific researchers, both in academia and in the private sector and so on, have ways that they can share their data early and often that don’t work against their own interests. We saw this with the Human Genome Project. If we could do that across the board in science, we’d really accelerate progress.
Scott: I said mine, health donor data.