This post is available as a PDF document.
Why do we study social networks?
Definition from: http://en.wikipedia.org/wiki/Social_network: A social network is a social structure made up of a set of social actors (such as individuals or organizations) and a set of the dyadic ties between these actors. Social networks and the analysis of them is an inherently interdisciplinary academic field which emerged from social psychology, sociology, statistics, and graph theory.
From http://en.wikipedia.org/wiki/Sociometry: “Sociometric explorations reveal the hidden structures that give a group its form: the alliances, the subgroups, the hidden beliefs, the forbidden agendas, the ideological agreements, the ‘stars’ of the show”.
In social networks (like Facebook and Twitter), sociometry can help us understand the diffusion of information and how word-of-mouth works (virality).
Installing NODEXL (Microsoft Excel required)
NodeXL Template 2014
The SocialNetImporter extends the capabilities of NodeXL mainly with extracting data from the Facebook network. To install:
- Download the latest version of the social importer plugins from http://socialnetimporter.codeplex.com
- Open the Zip file and save the files into a directory you choose, e.g. c:\social
- Open the NodeXL template (you can click on the Windows Start button and type its name to search for it)
- Open the NodeXL tab, Import, Import Options (see screenshot below)
- In the import dialog, type or browse for the directory where you saved your social importer files (screenshot below):
- Close and open NodeXL again
For older Versions:
- Visit http://nodexl.codeplex.com
- Download the latest version of NodeXL
- Unzip the files to a temporary folder
- Close Excel if it’s open
- Run setup.exe
- Visit http://socialnetimporter.codeplex.com
- Download the latest version of the socialnetimporter plug in
- Extract the files and copy them to the NodeXL plugin direction. Defaults to C:\Program Files\Social Media Research Foundation\NodeXL Excel Template\PlugIns
First steps with NodeXL
The following table is a matrix showing trust within a group of 6 people. An “X” in a cell means that the person who’s name is in the cell’s row trusts the person who’s name is in the cell’s column
Because trust is not automatically reciprocated (Ann trusting Bob does not necessarily mean that Bob trusts Ann), the graph that we will build will be directed.
A directed graph implies that the edges (links) between two vertices (in our example, people) have a direction: A B. An example of a directional graph is Twitter. In Twitter, you can follow a person that does not follow you and vice-versa.
In an undirected graph, the edge is reciprocated. This means that, if you have a connection to a person, this person has the same connection towards you. An example of an undirected graph is Facebook: if A is a friend of B then, automatically, B is a friend of A.
- In the Windows Start menu, click “All Programs”, then “NodeXL,” then “Excel Template.” (in Windows 8, open the tile menu and type “nodexl” to search and find the program)
- Notice the new “NodeXL” tab in the Ribbon:
Drawing your first network graph
Open the vertices sheet and enter the name of the persons from the social matrix provided (above):
Go to the edges sheet and enter the name of the trusting person in the Vertex 1 column and the name of the trusted person in Vertex 2.
On the new NodeXL tab, define the graph as “directed”.
Click the “show graph” button for a visual representation of your graph:
Copy the names in the Label column and refresh your graph to have the names displayed next to the vertices:
To replace vertices with images, copy and paste photo links (from Facebook profile photos or elsewhere) into the image column and define the shape (in the shape column) as “Image”:
Refresh the graph:
To calculate the metrics from the graph, go to the NodeXL tab and click on “Graph Metrics”:
Click on the “Select All” button and “calculate metrics”:
In the Vertices sheet, scroll the right to display the metrics columns:
- In degree: most trusted person (Edna, Bob). It is a count for the number of “arrows” that link to the person (people who trust the vertex).
- Out degree: most trustful (Edna, Fred, Claire). It is a count for the number of “arrows” that go out from this person (how many people are trusted by the vertex).
- Prestige is a metric you can choose to calculate yourself, it is the number of links to the person divided by the total number of possible links (in this example, the maximum would be 5 links or 5 people trusting the 6th person). A prestige of 1 means that “everyone trusts the vertex”
- Betweeness centrality (Centralité d’intermédiarité) : A high number means that the person is in a central position in the graph. This metrics is based on the shortest paths between all people in the graph. It measures how important a node is by counting the number of shortest paths that it is a part of.
how many pairs of individuals would have to go through the vertex in order to reach one another in the minimum number of hops?
- Closeness Centrality (Centralité de proximité) : who has faster access to information (in case of information diffusion). This number is an evaluation of the average distance to all nodes in the network.
What if it is not so important to have many direct friends? or be “between” others. Yet, one still wants to be in the “middle” of things, not too far from the center.
- Eigenvector Centrality (Centralité vectorielle) : Eigenvector centrality calculates the position of a node to all other nodes in the network by giving a weight based on the distance (La centralité vectorielle essaie de mesurer l’importante ou l’influence d’un nœud dans le réseau en donnant un poids relatif à chaque lien du réseau. L’idée étant que la centralité d’un nœud est égale à la centralité des liens auxquels il est rattaché. à on tient compte de l’importance des liens auquel il est attaché). It is different from betweeness centrality in the sense that it considers all paths between nodes, not only the shortest one.
Eigenvector Centrality measures the importance of a node by the importance of its neighbors.
Eigenvector centrality is a measure of the influence of a node in a network. It assigns relative scores to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes
- Centralité de Katz and page rank: Alors que le degré de centralité mesure le nombre de liens adjacents, la centralité de Katz mesure tous les nœuds qui peuvent être connectés à travers un chemin en pénalisant les nœuds distants.
Katz and Page Rank Centrality are a median solution between degree centrality (shortest paths) and eigenvector centrality (all paths) as they measure the number of all nodes that can be connected through a path, while the contributions of distant nodes are penalized.
- Clustering coefficient : The clustering coefficient is a measure of an “all-my-friends-know-each-other” property. If the value of the Clustering Coefficient is equal to 1, this means that each of my friends is friend with all the others.
- Reciprocated vertex pair ratio: reciprocated vertex pair ratio: ratio between ingoing and outgoing connections (only valid in directed graphs).