Big Data and Deep Data

I’m officially done with my dissertation — It’s been handed into to ml committee and I couldn’t change anything, even if I wanted to. This puts me in an odd position: for the past 24 months most of my days were spent working on my dissertation, either analyzing my interviews, outlining my ideas, writing or editing. Being done with this has left a pretty big hole in my daily schedule.

I’ve started work on a few other projects to fill this gap, projects that have me working with entirely new types of data than in my dissertation. My dissertation research was interview based. I conducted 110 interviews which produced something like 70 hours of tape and over 3000 pages of transcripts. I have lots of detail on the 80 entrepreneurs I talked to. I know how and why they started their company, how they raised money from investors or why they’ve avoided it, the challenges they’ve faced and what they did to overcome them, if they’ve networked with other entrepreneurs and what they talked about.

This data is amazingly deep, but in the grand scheme of things it’s very small. I talked to about 1/3 of the high-tech entrepreneurs in each city who happened to be on a business directory I used. So, when I found really cool things in my interviews, like the fact that most entrepreneurs in Waterloo actively searched through their own social networks to find mentors but those in Ottawa mostly relied on their parents or former business partners to provide business advice, it’s hard to say if this is something True for everyone in the city or if it was just a coincidence. There are a few statistical tests to try to figure out what’s real and what’s an illusion, but they can only go so far.

The new project I’m working on gives me access to fantastic datasets about innovation and economic development in Canada. This includes the famous Dun and Bradstreet directory, which is the biggest dataset I’ve ever played with. Clocking in at 1.5 gigabytes, it contains information on more than 1.5 million Canadian firms. I would consider this to be on the very small end of ‘big data.’ For someone studying entrepreneurship, this is a godsend. I can now tell you, for instance, between 2001 and 2006, there were 669 new high tech firms founded in Toronto* and that the average sales of these firms are around $360,000. I can also make really cool pictures like this, which shows that there is a positive relationship between the proportion of immigrants in a region and the proportion of high tech firms in every province except Saskatchewan and New Brunswick.

What’s up with NB and SK?

But as I work more and more with this data, I’m beginning to see its limitations. I know things about a whole lot of firms, but I don’t know much about them. With the D&B data, I essentially know a firm’s name, it’s address, what year it was founded, what industry it’s in, how many employees they have and a guess about their sales number. In aggregate, these data can tell me many things — which regions have the most startups, which industries seem to grow the fastest, what’s the relationship between workers and sales across the entire country. But it also raises lots of questions that the data can never answer.

Looking at one record at random, I know that Bait Consulting Inc. of Thornhill is a consulting company that was formed in 2001 and which has one employee and an estimated 120,000 in sales. But unlike in my dissertation research, I don’t know anything more. I don’t know why the company was founded, I don’t know why it was founded in Thornhill instead of Toronto or Mississagua or Cambridge. I don’t know how its founder learns about the market or finds new customers.It’s difficult to figure out if a government policy is working from this data, or how an entrepreneur is affected by where they live.

That’s the big difference between big data and what I’d call deep data. Big data can tell you a small number of things about a whole lot of things. You can do a whole lot with this, but you always need to be aware what it’s not telling you. Only so many different questions can be asked on surveys — the more you ask, the fewer people will respond.

Qualitative data collected through long, semi-structured interviews, is deep data. I know a lot of about the people I talked to. Not everything, and many of the responses are biased by the respondent wanting me to think they are really skilled entrepreneurs. I know more than a binary variable, I know what they did, why they did it, and what that has caused. I can understand what practices they took to start and grow their firm and relate those back to their larger cultural context. But again, there’s that tradeoff: I know a lot about a very small number of people. And I have it easy, people doing ethnography or observational research will have hundreds and hundreds of hours of recordings or notes about an even smaller range of people.

It would be nice to think that we can meet in the middle, but working with big qualitative datasets requires a totally different set of skills than working with big quantitative datasets. Very few people are equally as able to produce a grounded analysis of a collection of interviews and a Baysian analysis of a census dataset. But there is value in each, and the challenge is being able to figure out the right way to collect data to solve a problem. The platonic ideal is for quantitative and qualitative data to be used together to prove a larger point, but this kind of research is expensive and rare. But it might be the only way to get a real sense of what’s going on in the world around us.

*This seems really low to me and I’m already working with librarians and others to figure out the proportion of all firms the D&B directory accounts for


Book Review: Startup Communities: Building an Entrepreneurial Ecosystem in Your City

I just finished reading Startup Communities. It dovetails nicely with what I’ve been thinking about, that entrepreneurship relies on an entire community surrounding the entrepreneur. Here’s my mini-review for all you busy business people: I agree with the first part of the title and disagree with the second part. I believe startup communities are vitally


Perspective

This Waterloo startup, was launched, grew, and was bought up by Google in less time than it took me to research Waterloo’s entrepreneurial environment.


Wither Waterloo?

Research in Motion is not a healthy company. It makes a product no one particularly wants for a price no one particularly wants to pay. The reason for the company’s decline will no doubt be chronicled in a thousand MBA case studies, but I imagine at the end of the day it will simply be


To New York I go

I’m heading down to New York City tomorrow for the 2012 Association of American Geographers conference. 7,000 geographers. 5 days. 1 city. It’s always an interesting time. Here’s the presentation I’ll be giving. It’s based on some of my newer work that looks at the connections between local entrepreneurial cultures and the reasons why entrepreneurs


New article: The sources of regional variation in Canadian self-employment

I just got the final version of my new paper in the International Journal of Entrepreneurship and Small Business (Vol 15, issue 3, pages 340-361 for those keeping track at home. E-mail me for a copy). This is my first solo paper and the first paper that I controlled from start to finish. It’s not


Angels in the back field

I love it when newspapers provide great examples of economic geography. In just the past few weeks, I’ve seen a cornocopia of great articles that really exemplify why economic geography is so amazing. We have Adam Davidson’s Making it in America cover story in the Atlantic (which I’m currently forcing 200 students to read and


New Article: The Spatial Economy of North American Trade Fairs

I’ve been busy over the last few weeks teaching my first class ever, but I got a pleasant surprise that an article that I had written last year has finally been published in The Canadian Geographer. The Spatial Economy of North American Trade Fairs uses a unique dataset to track the location, size, and types


Well, that was quick.

  I’m mostly posting this here so I can find it again in the future. This is like classroom discussion gold. Maybelline is already Occupy Wall Street themed ads. What ever you might think about capitalism, it will adapt to anything.


A quick thought

I don’t want to do many of these short questions designed only to provoke, but I’m reading the Steve Jobs biography and it’s hard not to feel somewhat philosophical. Here it is. There is only one question that matters when studying the geography of entrepreneurship: If John and Clara jobs had not moved back from Wisconsin