The most common mistake a researcher can make is to confuse association with causality... big ole post here
Mo Data stashed this in Big Data Philosophy
Causality, Closet Nazis, and the Metrics of Criteria
A few semesters into my undergraduate studies, I decided to move away from campus in order to escape the incessant party atmosphere. I chose to rent an old but roomy apartment in Kitchener, Ontario. I read somewhere that this city was formerly called New Munich. One day in the dead of morning, I heard loud banging and smashing downstairs. It seemed that neither I nor my cat could sleep, so I went to investigate. I saw the owner of the building at the front entrance of the apartment holding a bat and blocking the doors. I heard him say, "I'm not looking for any trouble. I'm just trying to run a business." Hiding behind him was a female tenant of the building, apparently the target of threatening comments. Slowly from the shadows, there emerged four young men of remarkably similar appearance - attire, physique, and cleanliness. These rather well-groomed men weren't hurling sexual comments at the lady but rather ethnic. I heard one of the young men say that the lady certainly looks like a Jew. I was in my pyjamas and slippers, and I seemed to be the only non-white person downstairs. At the time it seemed perfectly normal that I ask these men what they expected to accomplish by smashing garbage containers and breaking bottles in the early morning. I said that it takes a great deal of effort to create such a disturbance; and, regardless of their sentiments towards the lady, such harassment seems to burden them more than anybody else. One of the young men, apparently the leader, thought over this comment. He admitted that he agreed with me. The group went their way although with some additional smashing as they left. I thought that these men were probably Nazis although not necessarily by political membership. Now many years later, I share this experience as an entry point to discuss certain pathological aspects of criteria-driven metrics. By pathological, I mean that the use of metrics - the methodology and resulting data - seems structurally deficient.
I routinely come across articles emphasizing the importance of examining causality rather than association. The usual criticism is as follows: perhaps the most common mistake a researcher can make is to confuse association with causality. I think this criticism is important - although perhaps more as an academic concept than actionable insight. In real life, I suppose that something can contribute to or cause another thing; but the likelihood of nothing else being causally relevant seems remote. For instance, what causes children to be intelligent? If we could clearly define the meaning of intelligence - and I doubt that we can - the issue might be examined purely in terms of demographics. Belonging to a certain ethnic background might be said to contribute to greater intelligence. However, intelligence is a complicated issue. A person that succeeds in one particular environment (such the world of finance) might do miserably in another setting (such as academics). Intelligence is likely related to access to education. There are also broader social contexts: for example, discrimination can make some individuals seem intellectually weak although there might simply be systemic barriers at play. Causalities derived from outside a laboratory are debatable; indeed, even the causalities discovered from within a laboratory might persist poorly under more realistic conditions. Yet data is accumulated often with the intent of establishing causality; or its inference might be necessary in order to make use of the data in the intended manner. I suggest that in many routine situations, data can become insulated from reality; this occurs as complex and dynamic causalities are gradually replaced by simple and static assertions. Alienation can become entrenched or systematized in data systems, which then furthers its spread. We lose the ability to detect the imposition of disabling environments.
I want to develop my arguments further by continuing to discuss Nazis, which to me seem like interesting agents of social construction. I refer to Nazis found both in science fiction and in history. Many of us have been exposed to Nazis informally through stories - such as my own at the opening of this blog. Maybe some readers have never heard of Nazis while others have encountered the more militant versions in real life. The National Socialist Party of Germany was led by Adolf Hitler. I suspect that the term "Nazi" is shortened from "National Socialist" (Nationalsozialistische). The relevance of the term in films and books is more extraordinary than its origins. I find in the movement a tremendous emphasis on central control and prescriptive criteria; search for social conformance and adherence to rules; preoccupation with the separation of superior from inferior; and belief in movement towards of perfection. For Nazis, causality seems to extend from their perceptions and preconceptions of reality. This is precisely my point in regards to causality more generally. More often than not, except in the most controlled laboratory environments and simplest dynamics, causality is the product of social construction. Why would this be an important concern in terms of data? Both the data system and the data it contains can be regarded as instruments of inculcation and promulgation - to control the expansion of knowledge and therefore power in society. In an organizational setting, the data can serve to commandeer structural capital and impair decision-making.
Clones, Nazis, and the Subservience of Data
I was not entirely surprised to discover in the Clone Wars that Darth Vader's storm troopers are actually clones. While I am no expert in genetics, I believe that I can make an assertion about clones that most people would readily accept: they are all the same. Imagine the benefits of having a "one size fits all" situation: the same uniform can fit everybody; all of the chairs and tables can be the same size; there only has to be a single educational system since everybody learns exactly the same way and at the same pace. The storm troopers in real life - in Nazi Germany - were not actually clones although many themes seem consistent. We find a move towards ideal fit or minimal differentiation. Nazis regarded in more conceptual terms would likely believe in improved productivity through greater uniformity and compliance. They would tend to promote viral behavioural normatives for the purpose of achieving specific outcome. I know there is a perception that Nazis lost the war; but I feel that they persist in the vernacular through management practices and quasi-intellectual organizational theories. It is difficult to ignore the sameness prevalent in many organizations and the viral nature of their conformity. Their data systems are so similar as to be purchased off the shelf, indicating a comparable if not uniform treatment of data. The data gathered tends to bind a company to its chosen identity, which upon closer analysis reveals many similarities to other players in the market. I suggest therefore that it is not so much self-identity that controls an organization. The data system is evidence of incursion by an external power. The causalities extracted serve as cogs to support and reinforce prescriptive decision-making. What is this external power? It must be similar to gravity. It draws people near. It causes them to fall, slip, and trip. It is theocratic more than scientific - a belief in "progress." For there to be progress, there must also be direction.
In environmental studies, the idea of sameness can be found in monocultures. A monoculture might be a species of plant found to have superior properties; so rather than allow a natural diversity to exist, large fields might be deliberately set aside for a limited number of varieties. There might be increased yield on one hand; on the downside, any weakness affects the entire population. For example, a particular type of wheat might be susceptible to rotting. If the wheat planted all come from the same strain, the entire population can be wiped out by a single disease. Sameness can be found in organizations among decision-makers. A colourful expression from a human resources instructor that I once had is "corporate inbreeds." Executives might hire people of similar persuasion not necessarily because it is in the best interest of the company but rather for the sake of organizational congruence - the idea being that sameness is superior. In the recruitment of workers, "similar-to-me" hiring seems to occur: in this case, the selection process might favour people that are similar to the person doing the selecting. So although a large amount of data might be compiled during screening, the selection process might nonetheless disassociate an organization from its underlying needs. I have spent a number of blogs exploring differences in data: 1) the metrics of criteria resulting in data to ensure that operations conform; and 2) the metrics of phenomena helping to highlight the realities that face an organization. In an organizational setting, the need to control is supported by the metrics of criteria; and the need to understand by the metrics of phenomena.
Sameness is a rather "industrial" concept that helped us to transform and modernize production; to usher in an age of progress. I offer an analogy. One of the important developments of the industrial revolution was the introduction of standardized machinery in production environments; this created a need for standard parts that could then be replaced in the event of failure. Sameness enables replacement. It is important to recognize how this ideal can influence our perceptions of causality. Janice's inability to stay at her seat "caused" the office to fall behind schedule. The office's poor ventilation caused Janice to leave her seat for fresh air more often. The employer's unwillingness to perform regular maintenance caused the ventilation to operate poorly. Lack of revenues contributed to cut-backs that caused less maintenance to be performed on the ventilation system. Consequently, lack of revenues among other things caused the office to fall behind schedule. So what is the true causality? I believe that most organizations would have little difficulty making Janice the scapegoat. If Janice where a machine or part of a machine, she could be blamed for her adaptive failures and then replaced. Replacement is made possible through the commodification of labour where large pools of individuals can offer the same work at similar levels of quality. The market has moved in the direction of commodified labour not by accident. In the metrics that determine causality, we set aside the interconnectedness of things and elevate only particular aspects of the data that fulfill our instrumental needs. While causality is something extremely complex, instrumentalism poses it in the most simplistic terms; this simplicity is actually parts of the production system. If the case were different, there wouldn't be such extensive cloning of systems, labour, management, and even products. We have come to expect uniformity not just in what we do but also how we think about production.
Elements of Taylorism in Nazism
I originally had the heading "Elements of Nazism in Scientific Management," but I was reluctant to draw a connection. Similarly, because Frederick Taylor has been regarded by some as the father of scientific management, I find it difficult to write about him in critical terms. However, consider Taylorism more as an abstract concept. In Taylor's testimony before the Special House Committee, he said that there can only be one right way to shovel coal. Taylor had some success and also failures applying a highly scientific approach to improve the productivity of workers. He studied how workers shovel coal and gravel. Equipped with a stopwatch, notebook, and it seems a measuring stick to obtain exact shovel dimensions, he arrived at specific ways to shovel that seemed to improve output. Since labourers come in all different shapes and abilities, I find myself questioning his findings or at least doubting the usefulness of his conclusions. Nonetheless, Taylor's influence remains felt even today. Big data for some people might mean heading down the same road paved by Taylor's early investigations. If a person had no choice but to shovel coal, the idea of using a scientific approach to evaluate repetitive movements probably seems like a reasonable route. Computing technology didn't exist during Taylor's time. Perhaps, he would have been an avid supporter given his tendency to accumulate and evaluate large amounts of data.
I believe that Taylor was what many people today would describe as a "data scientist." He was a scientist in how he arrived at his conclusions. But whether or not the conclusions really mattered was a separate issue. An astute company could have ignored his finding pertaining to manual labour, turning instead to heavy machinery. Machine labour increases production, the safety of workers, profitability, while improving delivery times. Taylor's basic assertion as indicated by the literature was that workers often wasted time by standing around. I believe that Taylor used the term "soldiering." He felt that soldiering contributed to poor production. In response, he aspired to find ideals – that is to say, perfect ways to get the job done. If everybody performs their duties in the perfect way, production would be at an optimal level. His assertions were prescriptive in nature; the metrics were entirely criteria driven. Personally, I find myself more focused on how the workers must have felt being labelled as lazy by an outsider; being told in the smallest detail how to bend, dig, and lift; being judged by metrics of performance perhaps unrelated to effort and commitment. A person who works every day might not appear as hard-working as a person who works a few days a week. The portrayal of reality in the metrics is highly constrained. There are recent cases where testing standards have been found to be dissociated from the reality of the work - literally that the metrics did not reasonably mean what the proponents claimed.
It is possible to give the illusion of progress perhaps by paying some employees to dig holes and others to fill holes. It is possible to schedule high-level meetings; having meetings to schedule meeting; meetings to discuss meeting. The fact there is movement does not necessarily mean that it leads to production. Conversely, lack of movement or the metrics-evasiveness of behaviours should not lead us to believe that the behaviours as necessarily non-productive. I spend a lot of time programming and writing; it is often difficult to quantify the results. I wrote some of the software that I now use routinely more than 20 years ago. I guess it would be easy to dismiss my efforts as non-productive. In any case, I believe that the Nazis embraced aspects of Taylorism. While I "question" the causalities asserted by Taylor - e.g. that following prescriptive criteria is the best route to improvements - the Nazis made assertions on an entirely different level. The Nazis didn't pose their concerns in relation to laziness but rather racial superiority and inferiority. They seemed less concerned about telling workers how to shovel coal and more interested in imposing dominance and oppressing other groups. Nonetheless, I would suggest that the differences are not in substance but rather magnitude and texture. The Nazis were probably applying production management techniques on a broad social rather than factory level. While Taylor gathered production data, the Nazis conducted questionable research to support their movement. It seems that Nazis found for instance that many races were inferior to Germans. I am reminded of an article that I read during my graduate studies showing different rankings of mental incapacity. The rankings seemed to be used substantively to divide people and perhaps forcibly impose institutional care. I believe similar principles can be applied in a setting of mass production to oppress people.
Big Data as Structural Evidence of Control
Why have I bothered making the connection between Nazism and Taylorism in this blog? In an age of big data, I consider it tempting to slip into one of these powerful mindsets: Taylorism in relation to production environments and Nazism in broader social settings. Recently there has been an explosion of surveillance tactics both by companies and governments on a global level. I believe it would be fair to say that Taylor and also the Nazis spent a lot of time gathering data about people and placing their findings into structures determined not by the data itself but their preconceptions of reality. Michel Foucault in "Surveiller et punir" describes in rather graphic detail the rise of the penitentiary system in Europe. I believe it would be fair to say from the French text, punishment (from "punir" to punish) is a horrifically embodied experience. The other component of the title resembling "surveillance" (from "surveiller" to watch) denotes a disembodied condition. It is possible to monitor distant areas of concern remotely from security and intelligence offices. Data-loggers such as black-box devices can compile information without human involvement. We therefore have the disembodied and systematized imposition of monitoring versus the embodied and lived experiences of those being monitored. I would describe surveillance as a type of punishment in itself. Most of us have a reasonable expectation of privacy. Privacy is evidence of autonomy and freedom. Therefore the presence of vast amounts of data regarding our day-to-day lives causes us to question our autonomy. Why would this necessarily be so? As I have already pointed out, data in the context of Taylorism supports a paradigm of external control over our smallest behaviours. In terms of Nazism, data seems to bring about social delineations apparently to distance some segments of society from the benefits of membership. So while having massive amounts of data might not in itself mean anything, it can easily mean the wrong thing.
Foucault in his book mentions J. Bentham's Panopticon. A Panopticon is a distinctive type of prison facility that allows for quick viewing of prisoners. I would describe a Panopticon as a turret surrounded by holding cells: from the middle the building, a guard can see the prisoners sprawled all around. Moreover, the prisoners are all able to see each other. Those detained in such a jail have little privacy. Humiliation is a collective experience. The idea of a "jail" is really quite a concept: in order to keep somebody behind bars for long periods of time, there has to be some cohesive rationale. One interesting reason that I picked up during my graduate studies involved a belief in the following: people tend to conform and do what is expected of them if they are made aware that they are being watched. Jails can therefore serve as a place and form of behavioural conditioning. These days, although they don't appear to be in jail, everyday citizens are constantly reminded that they are being watched - that enormous amounts of data are being gathered about them all the time. Recall that Taylor had declared war on laziness; to me, having such an individual nearby with a stopwatch watching every move is certainly comparable to how a prisoner might be treated in a penitentiary. The move towards higher levels of surveillance represents a social trend. There must be a presumption of "criminality" to justify the collection of data. The crimes are not crimes in a legal sense but rather in a Nazi sense: failure to conform to predefined normative behaviours makes a person a criminal. The intent of surveillance is not necessarily to prosecute members of the general public but instill within them a fear of non-conformity.
The idea that we manage what we measure implies that everything we measure is destined to be managed, which is not necessarily the case at all. However, the current context in which we collect data almost seems to necessitate control over the underlying phenomena giving rise to the data. What are we comparing the metrics against? In order to assess something as erroneous or inferior, it is necessary to have a basis of comparison. Here I borrow from my background in environmental studies by raising the concept of a "climax": e.g. a climax forest. The argument in support of a climax is as follows: many things including human society naturally gravitate towards a superior form. It is part of our evolutionary development. In this context, there can indeed be only one right way to do things or to exist - all else being inferior or simply transient states en route to perfection. We hear the climax expressed in comments such as, "The children are getting taller and taller every day!" and "They are teaching calculus to kids at a younger age each year!" The presence of criteria presumes an ultimate superior form or condition. From a point of perfection, diversity would be illogical. It only makes sense to have clones and monolithic approaches.
Causality in Data
The accumulation of massive amounts of data can support subjective and potentially destructive processes of organizational and social engineering. Interpreting data from a Nazi standpoint (although not a "Nazi's standpoint"), any data collected serves to satisfy the metrics of criteria. Since it is impossible to find fault in perfection, there is never any need to adapt an organization to changes in the environment. One only has to ensure conformity and compliance. The data therefore serves to demonstrate level of conformity; it becomes a tool to project ideals and identify distance from ideals. Wrongness from the standpoint of perfection can be measured by evaluating those things that exhibit non-conformity. I don't claim to be an expert in Nazis by the way, so I apologize if my portrayal misrepresents Nazis in real life . . . in case there are any Nazis reading this blog. "What caused John's poor performance? Since John belongs to an inferior demographic group, poor performance is likely due to his connection to that group." "What caused the death of this patient? The surgeon that performed the operation failed to follow rules of professional conduct; this non-professionalism caused the death of the patient." Lack of sanitary tools and supplies had nothing to do with the death. Unrealistic schedules and deadlines imposed by the hospital on the surgical team did not kill the patient. The problem with the Nazi perspective is that it might be entirely wrong; yet there is no means of recovery from bad decisions. The causality has been placed on a pedestal; and it is difficult to question perfection. Data can be and has been used to define people and impose behavioural requirements. I would say that an important role of causality has been to colonize and subjugate indigenous populations, making them primitive by labelling them as such. Causality can be used to make people inferior; establish criminality; and silence reality.
"Anybody who drinks alcohol is a criminal." To me the statement tends to reflect a Nazi mindset. In Canada, a federal minister once said something to the effect, "If you don't allow us to openly access your data, you support child molestation." (He didn't use these exact words. Many people interpreted his comments to mean that they were personally being accused of child molestation.) A logical extension of this assertion is that people concerned about big government and loss of personal privacy are more likely to be child molesters. Thus we encounter a legitimization of a surveillance society not to understand more about people but rather to ensure their control, adherence to rules, and conformance to standards of conduct. I feel that Nazism has been revived in broader social settings by our technological capabilities - particularly the ability to accumulate massive amounts of information. Unlike the Nazis, I don't know what the final form of our society should be. Unlike Taylor, I don't believe there is one right way to perform industrial processes. However, I do want to take a moment to examine the non-adaptive nature of the metrics of criteria. When some segments of society start slanderously and neurotically labelling others to put them in their place, criminalizing them almost to push them off their place, we really start to encounter the structural entrenchment of control. We witness the use of surveillance and data to justify modes of incursion and social disablement.
To suggest that people should not confuse association with causality assumes that understanding existsa priori: i.e. we already know when a relationship is nothing more than an association, and when within this association there are no aspects of causality. A follower of Taylor might say, "You know, there is only one right way to shovel coal. You aren't doing it right. This is why you aren't getting much work done." So this is a person that professes to have an authoritative understanding. But is it really necessary to dismiss an association just because it doesn't quite fall within our expectations of substantive determinants - i.e. fit our view of how things should be controlled? For example, we are now starting to encounter drug-resistant bacteria. If prescriptions stop working, it becomes necessary to seek options outside the existing framework. I question how far the search can go when information only flows in one direction - from the epistemological authority to everybody else. Contextual associations are not necessarily without value. While substantive analysis might legitimize the use of particular therapies, this is not to say that same analysis delegitimizes contextual associations. The fact that I can take a pill to relieve a headache does not diminish the relevance of life events associated with headaches - such as lack of sleep, stress in the workplace, and nutrition. So even if a pill stops working, there are other options to deal with the pain. There is no single right way to deal with the problem.
Organizational Clones and Pathologies of Structure
I have encountered comments to the effect that companies are not using their existing data resources well or effectively; so expansion into big data might not lead to particularly worthwhile outcomes. Why might the use of data not led to clear returns? What does it mean to make effective use of a resource? I choose to blame the problem on clones, yes indeed. It has become evident to me that business as we know it is built around conformity. Companies hardly ever develop their own systems. They customarily import or acquire applications as opposed to developing their own. The strategy is similar in terms of making use of people with similar competencies, management methodologies, metrics, and data systems. It has become common to acquire and replace people and resources to achieve the "best fit" rather than adapt to changing circumstances. I point to how similar most office environments are along with how authority is delegated. I suggest that organizations are so similar, in fact, that their internal structures are forced to handle data in a manner that might not be ideal in order to conform to industry practice. The instigation of change occurs through the metrics of criteria irrespective of environmental circumstances. By this I mean that the process of change is more a matter of form than substance - projection of design rather than the articulation of contextual relevance. It is easy to produce a product that nobody wants because this is merely a matter of design; the metrics of criteria mostly serve the cause of conformance. By the way, I am certainly not expressing any problems with the sale of commercial software and systems: I am more focused on the resulting lack of adaptation and the potential limits imposed by standardized solutions and methods. I consider the involvement of data scientists part of a creative process to help organizations recover from evolutionary dead-ends.
Even a company that adheres to rules faithfully can nonetheless fail. Ford for example was surprised to discover that consumers had become disinterested in the Model T. The company could have continued producing the same vehicle regardless of what the market wanted. At some point, Ford embraced the need to start producing different types of cars. Consumers came to expect diversity and change. It is the company that had to conform to the market rather than the other way around. There is no such thing as a "perfect" car. If there were a perfect car, and a factory could produce it, there would never be a need to retool facilities to build other types of vehicles. Imagine being a Nazi, perhaps not necessarily insisting on a single perfect vision of reality but certainly insulating one's sense of reason from the consequences of one's actions. This is something an alcoholic might do. Organizations can become structurally pathological in a similar manner: they might decide to use the data to support convoluted perceptions. This is not to say that the data is "wrong," but rather its presence might be part of the illness. To wish to become all alike, this is how a disease spreads from one company to another.
A data system does not affect data merely by the design of the system but also by its interaction with other parts of the organization especially people. For instance, when a system is predisposed to reductionism, this diminishes the need for employees feeding the system with data to acquire complicated information. Yet it is in this complexity where answers to troubling questions might be found - such as where a company should be headed. The decisions rendered from the data likewise become shaped not so much by the underlying phenomena of the data but the instrumental aspects of the data that conform to the system. The metrics of criteria aren't meant to guide change at all but simply ascertain the extent of compliance - in a manner of speaking, how much things remain on track even if an organization is completely off track. The search for the climax or ideal has caused organizations to import solutions and methodologies, and make decisions that are increasingly disassociated from their business settings. Therefore, I would expect the only stable path towards greater use of big data to incorporate both the metrics of criteria and phenomena. I would expect the truth of relationships to be fashioned more by the underlying phenomena of the data itself rather than the abstract disembodied philosophies and assertions of inbreeds.
You're right -- big post!
I love this phrase: "underlying phenomena of the data itself".
I'm still looking into export, btw.