‘Big Data’ has become a catchy term that, for the
moment, retains some mystique and persuasive impact in use. For example,
one could imagine hearing in a sales presentation, or at a conference
or cocktail party, ‘big data is the key to success; harness big data and
you harness the world; if you are not using big data, you are doomed;
big data changes everything’. Similar to the way we perceive Twitter,
online shopping, customer relationship management (CRM) systems and many
other technology-related phenomena that have had their moments of hype,
we believe that there is meaningful value in BD. However, we do not
believe that BD’s impact is as near to solving the world’s problems as
some might suggest. We expect that BD will continue to be a cool term to
use in the short term.
We are not trying to shut
down that practice. However, we are curious about what people associate
with the term BD. Certainly it must communicate and reflect notions that
people rally around and act on. What are these notions? Are they good?
Bad? Ugly? Or, are they the Good, the Bad and the Ugly? Do they frame a
valuable way to look at the world? Does BD, whatever it means to
managers, scientists, researchers and others, motivate and enable
actions that may become, or are, worthwhile and better than the status
quo or rival alternatives?
Etymology
Before
sharing the perspectives of the leading scholars that we interviewed,
we feel that we would be remiss to not indicate some research into the
etymology of the expression BD. Lohr (2013), who was curious about the origins of the term, finds roots in direct marketing and in technology. He cites a Harper’s Magazine
article written by Erik Larson in 1989 that makes reference to junk
mail and the direct-marketing industry, ‘The keepers of big data say
they do it for the consumer’s benefit. But data have a way of being used
for purposes other than originally intended’. However, Lohr rejects
this as the origin of BD, writing, ‘Prescient indeed. But not, I don’t
think, a use of the term that suggests an inkling of the technology we
call Big Data today’.
He felt that credit was more
appropriate to give to John Mashey, who was Chief Scientist at Silicon
Graphics during the 1990s, as his use of the term reflected ‘not just a
lot of data, but different types of data handled in new ways’. In
addition, Lohr (2013)
references Francis X. Diebold, who is the Paul F. Miller, Jr and E.
Warren Shafer Miller Professor of Social Sciences, a Professor of
Economics, and a Professor of Finance at the University of Pennsylvania,
as the first to have published an academic research paper that includes
the term ‘Big Data’.
A tale of two disciplines
Well,
Lohr certainly explained how he arrived at his conclusion. A definition
with a bias toward computer science and technology informs it. The
scholars whom we interviewed explained that, indeed, multiple
definitions of BD exist as a function of the discipline in which one
operates. There appears to be a difference between what boils down to
two groups – and probably some other groups as well. Those who are
technology focused (for example, computer scientists, computer
engineers) are fascinated more with the amount of data that can be
stored; the speed with which data can be stored, accessed or
manipulated; and the structure of data, among a variety of
technology-related aspects. One might suggest that to them, it is all
about the data. Managerial problem solvers (for example, marketing
scientists, management scientists) are focused more on testing theory,
making discoveries and solving problems. With respect to data, they
might think, just show me the data!!!
Feit makes reference to some of those with a technology focus:
People who are experts in storing and managing data, they mean something very specific by big data. They mean a very specific set of tools, and a different approach to structuring data. So they think of any kind of big table database, as opposed to a relational database, as big data. So for them, big data isn’t necessarily size, but it’s the structure. It turns out that one structure is more useful when the data gets large than another, and so they think of big data as intimately tied to how you structure it. And they structure it in one big table, which facilitates parallel computation.
Thompson
elaborates further, suggesting an important distinction between those
who promote or sell ‘data-related’ technology and those focused on the
problems for which data can be used to help solve:
There’s a very strong tendency to basically blur the distinction between what you’re talking about and the underlying technology. SAS has a page up on what are big data, and SAS, of course, is the original master of big data. When the human genome project was being done in the late 90’s, the result was going to be the largest database that had ever been constructed to date. SAS software was chosen for that, because it was the only software package that could handle, at that time, data files of that size, that much data, and actually process it. It was considered to be the paragon, the ultimate big data project, of course outside of what the National Security Agency does. So the thing about this is, is that if you go and you read, what is big data, much of that is written by the technology firm. The software companies, the hardware companies that are selling it. And so what they do is, is that they blur the distinction between the technology and the actual concept that the technology is designed to address.
You see the same thing when we talk about communities."(For example,) companies selling software for online portals, basically treat the software as synonymous with the (meaning of a) community. (They see) community (as,) essentially, a software package, (as) a website. But of course in practice, community is a sociological construct. All the software does is, it provides a communications means for a community to potentially use to communicate. But just because people are using the software doesn’t mean they actually form a community.
So what happens is, a lot of the definitions you see of big data are defined, not by big data or the managerial issue, or the concept itself, but rather, they define big data in terms of the product they’re selling, which is software and hardware … I would argue that one of the things that’s derailing the conversation about big data, both within the organization and academia, is a failure to distinguish between essentially the marketing speak of the firms selling the software and the hardware, and the actual underlying problem the software is meant to address (Technology providers) want to define the problem in technological terms.
Important components and attributes of BD
Perspectives
aside, we all agree that data are part of a process that informs theory
development and decision making. We believe that data should be
considered only one part of the BD concept. The perspectives that we
have heard from the marketing scholars whom we interviewed led us to
believe that it could be beneficial to think of BD as a term that
represents a period of time or era, a process, and some new wrinkles
associated with components of that process. In this section, we draw
attention to these components and new wrinkles.
Purpose
A
first critical consideration of BD is the purpose of it. Little says,
‘I’m problem driven’. Lehmann quotes T.S. Eliot, ‘Where is the
information that we have lost in the information’, in making the point
that ‘people may get so obsessed with analyzing the data that they’ll
forget why they are analyzing it, and why they want to know about
particular relationships’. Moe states, ‘Think about what you’re trying
to accomplish’. Duparcq adds, ‘All the new connected platforms in the
world won’t make a difference if they can’t justify their existence by
the value they create. This is no doubt the core challenge’. And
Thompson emphasizes, ‘You need to start with a theory. Theory is what
allows you to frame a meaningful question. If you start looking for
answers, and you don’t know what the question is, you’ve got a problem’.
With more data and richer data available through technologies today, BD
enables the conceptualization of new problems and associated purpose,
or, deeper exploration of pre-existing problems.
Data attributes
Indeed,
it is important to be aware of certain characteristics, or nature, of
the data associated with BD. Yes, there is the attribute of size.
However, the ‘bigness’ of BD is, as Feinberg calls it,
… a moving target. Remember when scanner data came out? That was big data. We were stunned. Like wow, tens of thousands of households and dozens of purchases. And now, you can fit all of that data on anyone’s phone. You may not be able to analyze it on someone’s phone, but you can certainly do it on any PC. It’s kind of a joke by today’s standards.
Lehmann concurs:
When I was in school, I ran a five variable, one-hundred observation regression that took more than a day to run. That was BD at the time. Now, it is considered quite small. As computing power has increased, BD has become bigger. The problem is the same today as it was years ago.
Winer agrees as well:
When people first started using the term big data, I kind of laughed a little bit, because big data sets, huge data sets have been around for a long time. If you take a look at the kind of data that, say, telecom companies have on all their customers, or airlines form loyalty programs, banks on their customers, it’s not new. I think what the term tends to be used for is maybe, at least my perception is, the integration of customer databases, which as I said have been around for a while, and other information that they may be able to get about customers through social media, through user generated content, trying to integrate the two. Even though the sort of cyberspace data is not generally at the individual level, it’s usually anonymous, I think that the incremental availability of huge amounts of information that are flowing across the internet is what really people are referring to as big data. So maybe the airline and banking and telecom data were big data, and now we’re in the era of really big data or huge data or something like that. But as I said, I don’t see it being revolutionary, as much as evolutionary.
And, similar to Winer,
Duparcq likes to use a definition if BD that involves database
integration and vast amounts of online data:
The entirety of data that is collected on a singular individual, in a multitude of platforms (online traditional, online mobile, online new, internet capable connected devices, offline stores, offline media & credit card data), connected (all online and offline datasets in one database), correlated, and interpreted from both a single and larger group perspective.
Little
commented, ‘In fact, I had an email from my co-author (of) years ago,
Peter Guadagni, and he said, I think we were using big data’ (n.b., Guadagni and Little’s (1983) seminal paper was based on the analysis of scanner data).
Bradlow provides a more generalized notion of size that considers norms,
One way I think about it, just from most users’ perspective … (and) I don’t want to call this a technical definition, but it’s kind of a technical layman’s definition, is, data that’s bigger than you can open up with standard software. To me, that’s big data. Can I not even open up the file using … Excel, or do I really need Hadoop and massive big data skills even to open up the file? So to me, the technical definition of big data is just, does it require specialized training just to kind of observe and look at the file? Now that’s not what I think most people mean. But to me, that’s my first definition. If I have to call over to my computer science department just to open up the file, that’s big.
Pitt has a similar perspective,
I think it’s data that’s too big to fit into even the largest single database, or, if you want to think of it in simple terms, it’s too big to fit into even the very largest spreadsheet you could find. And I’d also say that it’s probably not analyzable by even the biggest computer in an organization. It probably has to be analyzed on something like clusters of different computers running special applications.
The data of BD are also different
in terms of structure and form. Sawhney adds, ‘The data is coming from
lots of unstructured sources’. Pitt points out that
What makes this different to most of the data that marketing people are generally used to, is that it also has lots and lots of text to it. So it’s not just about numbers about what people are doing. It’s also text about what people are saying, and what they’re writing, and what they’re indicating by means of words.
Feinberg takes it a step further:
So I think that the whole idea has been around for a long time … electronic data capture by the internet, and the ability to create databases from every single thing we do, that’s what’s really fueled this huge cusp, like all of a sudden, everyone’s talking about big data, because every single thing is captured. So if you capture an enormous amount of information, suddenly it’s big.
‘The reason we’re really in the era of big data is that we’re getting a lot of information on every single thing that every person does. And the ability to store that at low cost, and analyze it later, is what really is, to me, the functional definition of big data.
Duparcq provides some texture to the availability of consumer data:
Consumers’ propensity to generate data will largely be determined by two important factors, (including) the amount of access points available (number of connected platforms and consumer’s willingness to adopt those platforms). The average time users spend online rising constantly, data generated during that time is steadily increasing as well. In the US, usage has increased from an average of 5.2 hours a week in 2001 to 19.6 hours in 2012. Part of that increase can be attributed to consumers finding more ways to make interactive media an important part of their daily habits: shopping, information, news, entertainment, social interaction, etc.
In addition to highlighting
data attributes that partly distinguish or define BD, scholars note that
a key distinction of BD may be in the analytical tools or techniques
that are used to monitor, share and process the data and convert it into
information. In concert with this, they call attention to the ‘speed’
of the data and a need to act on it swiftly. Winer remarks:
It’s not so much the data that’s new. It’s the kind of tools that we have to analyze those data and react to the data in real time, or close to real time. So if a company is monitoring its comments about it on Twitter, they can immediately try to take corrective action by starting their own Twitter feeds, or putting out a message on a Facebook site, or that kind of thing. So, I think that that’s where the change is, not so much the fact that we’ve got more data.
Further more information about this articles, please you check on Journal of Marketing Analytics or e-mail Bostongarden@gmail.com.
By Bruce D Weinberg, Lenita Davis & Paul D Berger
Repost by Acarre Community Media
Post a Comment