@will_mccants posted an interesting Slate article yesterday questioning the statistical underpinnings of Nicholas Christakis and James Fowler’s social networking analysis as outlined in their popular book Connected. Here’s a snippet:
Two other recent papers raise serious doubts about their conclusions. And now something of a consensus is forming within the statistics and social-networking communities that Christakis and Fowler’s headline-grabbing contagion papers are fatally flawed. Andrew Gelman, a professor of statistics at Columbia, wrote a delicately worded blog post in June noting that he’d “have to go with Lyons” and say that the claims of contagious obesity, divorce and the like “have not been convincingly demonstrated.” Another highly respected social-networking expert, Tom Snijders of Oxford, called the mathematical model used by Christakis and Fowler “not coherent.” And just a few days ago, Cosma Shalizi, a statistician at Carnegie Mellon, declared, “I agree with pretty much everything Snijders says.”
Despite the statistical weaknesses of Christakis and Fowler’s argument, their book and discussion provides a useful perspective for understanding social networks. Christakis and Fowler’s recent fall likely represents the needed plateau of social networking analysis (SNA) – a useful analytical method most effective when utilized in combination with other research approaches rather than seen as an analytic panacea. Post 9/11 funding flowed like water to any outfit producing a cool looking SNA diagram with a big “Bin Laden” bubble in the center. Ten years later, I still like and utilize SNA, but I recognize some of its limitations as well. Here’s my thoughts on SNA as used in counterterrorism.
- Self-Fulfilling Prophecy- Many analysts exclusively using SNA routinely fall into the trap of using the method to confirm their preferred theory. An analyst begins with a seemingly logical story and then searches out bits of data and cobbles connections to ‘prove’ the theory. Sought after data points get put on the diagram and other evidence fails to make the chart. The analysis satisfies the analyst theory, appears convincing, and quickly falls apart when tested on the ground. I once argued repeatedly with an analyst who would vigorously trace all information back to Bin Laden. With sufficient time, this analyst could link every person you’ve ever met to Bin Laden and do it with a convincing chart. Never mind that almost any person can be linked to any other person in an average of six or so connections. (I think I read this somewhere, maybe in Connected so you might want to check my facts.) See Patternicity for a common example of how SNA is misapplied.
- Confusing data samples for populations- Analysts often believe that their data represents the population they are studying when it really is only an unrepresentative sample of that population. Coupling unrepresentative data samples with SNA results in focus on hubs and links of dubious importance. This misstep leads analysts to miss outlying actors that are in fact key but don’t have the necessary data to be properly evaluated. Example, I recently (2011) saw an unclassified, academic SNA showing AQ in Iraq as the key hub of AQ activity globally. This SNA advocated that AQ in Iraq be the new counterterrorism focus. I knew this to be off base and then realized that all of the data utilized in the SNA came from open source conflict reports; 80% of which originated in Iraq. However, no one mentioned this in the briefing.
- Can be overly complex- As technology improves, analysts have increased the amount of data displayed in SNA producing diagrams that look remarkably similar to my iPhone headphones when removed from my pocket- a big mess. Rather than using SNA to create clarity, the diagrams become almost indecipherable resulting in faulty conclusions.
- Centrality measured by links can be misleading- Most SNA suggests actors with the most links or actors linking different hubs are the most important. For the most part, this is true. However, key people (bad guys in my work) often deliberately stay in the shadows, push low-hanging fruit forward, and appear tertiary in SNA. Here’s an important new article discussing alternative perspectives to the commonly touted centrality notion; “Networks dominated by rule of the few.”
- SNA represents cultural factors and strength of relationships poorly- SNA provides a more quantitative method for diagramming relationships. Unless an analyst has good software and training, all links can easily appear equal. However, strength of connections and the cultural reasons for their existence usually prove most important in understanding complexity. Analysts relying almost solely on SNA will miss this.
Despite my cautions above, Christakis and Folwer’s work is beneficial and I think SNA remains particularly useful. Here’s some caveats and endorsements.
- Just because the math doesn’t work, doesn’t necessarily mean its wrong- Unfortunately for statisticians, all human behavior has yet to be proven by numbers. Christakis & Fowler’s analysis of “friends of one’s friends” making one fat, divorced, angry, etc. may in fact be true. These correlations may just be explained better by something other than math like cultural factors specific to certain populations.
- SNA is still great for mapping complex relationships- Law enforcement has been doing SNA with yarn and pictures for decades because it works and helps sort out complex problems.
- Technology makes SNA easy- Current tools provide a simple way to track data and relationships creating a central repository for locating information and its relationships.
In the end, I am enjoying the challenges to Christakis and Fowler’s approach but imagine that Internet enabled social networks will not be keen to promote discussions that might undermine their own strength.
Lastly, for those really interested in the mechanics of social networking, collective intelligence and current research, I highly recommend four places:
- Network Science Center, U.S. Military Academy
- Center for Collective Intelligence, MIT
- Center for Complex Network Research, Northeastern University
- Center for Computational Analysis of Social and Organizational Systems, Carnegie Mellon University