Facebook Friends Social Graph Using NetVizz and NodeXL

This post is available as a PDF document.

 

  • In Facebook, search for the NetVizz application or follow this link:1

 

  • Accept the permissions required to extract data from your timeline and pages

2

  • Select “personal network”:

3

  • Click on “start”:
  • Right click on “gdf file” and save it
  • Download and Install NetVizz to NodeXL converter
  • Run NetVizz to NodeXL converter
  • Select the Import from (GDF) and Export to (GraphML) files
  • 1
  • Make sure you have the right file names and click start
  • Open a new NodeXL file
  • From the NodeXL tab, select Import, select “from GraphML file” and select the file generated by NetVizz to NodeXL converter
  • 2
  • Generate the NodeXL graph.

Match IDs with Names in the Edges Sheet

NetVizz uses Facebook IDs instead of names. In order to create two columns. Use the following formulas to add lookup columns to the table:

– =VLOOKUP([@[Vertex 1]],Vertices[[Vertex]:[Label]],8,FALSE)

– =VLOOKUP([@[Vertex 2]],Vertices[[Vertex]:[Label]],8,FALSE)

Explanation: Look for the value of vertex 1 (or 2) in the first column (always) of the table Vertices (from column Vertex to Column Label) and get the value in the 8th column. False means that Excel should look for an exact match.

lookupformula

 

 

 

Introduction to Social Graph and NodeXL

This post is available as a PDF document.

Why do we study social networks?

Definition from: http://en.wikipedia.org/wiki/Social_network: A social network is a social structure made up of a set of social actors (such as individuals or organizations) and a set of the dyadic ties between these actors. Social networks and the analysis of them is an inherently interdisciplinary academic field which emerged from social psychology, sociology, statistics, and graph theory.

 

From http://en.wikipedia.org/wiki/Sociometry: “Sociometric explorations reveal the hidden structures that give a group its form: the alliances, the subgroups, the hidden beliefs, the forbidden agendas, the ideological agreements, the ‘stars’ of the show”.

 

In social networks (like Facebook and Twitter), sociometry can help us understand the diffusion of information and how word-of-mouth works (virality).

Installing NODEXL (Microsoft Excel required)

NodeXL Template 2014

 

The SocialNetImporter extends the capabilities of NodeXL mainly with extracting data from the Facebook network. To install:

  • Download the latest version of the social importer plugins from http://socialnetimporter.codeplex.com
  • Open the Zip file and save the files into a directory you choose, e.g. c:\social
  • Open the NodeXL template (you can click on the Windows Start button and type its name to search for it)
  • Open the NodeXL tab, Import, Import Options (see screenshot below)

1

  • In the import dialog, type or browse for the directory where you saved your social importer files (screenshot below):

 

2

  • Close and open NodeXL again

 

For older Versions:

  • Visit http://nodexl.codeplex.com
  • Download the latest version of NodeXL
  • Unzip the files to a temporary folder
  • Close Excel if it’s open
  • Run setup.exe

 

  • Visit http://socialnetimporter.codeplex.com
  • Download the latest version of the socialnetimporter plug in
  • Extract the files and copy them to the NodeXL plugin direction. Defaults to C:\Program Files\Social Media Research Foundation\NodeXL Excel Template\PlugIns

First steps with NodeXL

The following table is a matrix showing trust within a group of 6 people. An “X” in a cell means that the person who’s name is in the cell’s row trusts the person who’s name is in the cell’s column

3

Because trust is not automatically reciprocated (Ann trusting Bob does not necessarily mean that Bob trusts Ann), the graph that we will build will be directed.

 

A directed graph implies that the edges (links) between two vertices (in our example, people) have a direction: A B. An example of a directional graph is Twitter. In Twitter, you can follow a person that does not follow you and vice-versa.

 

In an undirected graph, the edge is reciprocated. This means that, if you have a connection to a person, this person has the same connection towards you. An example of an undirected graph is Facebook: if A is a friend of B then, automatically, B is a friend of A.

Running NodeXL

  • In the Windows Start menu, click “All Programs”, then “NodeXL,” then “Excel Template.” (in Windows 8, open the tile menu and type “nodexl” to search and find the program)
  • Notice the new “NodeXL” tab in the Ribbon:

4

Drawing your first network graph

Open the vertices sheet and enter the name of the persons from the social matrix provided (above):

5

Go to the edges sheet and enter the name of the trusting person in the Vertex 1 column and the name of the trusted person in Vertex 2.

6

On the new NodeXL tab, define the graph as “directed”.

7

Click the “show graph” button for a visual representation of your graph:

810

Copy the names in the Label column and refresh your graph to have the names displayed next to the vertices:

1112

13

To replace vertices with images, copy and paste photo links (from Facebook profile photos or elsewhere) into the image column and define the shape (in the shape column) as “Image”:

14

Refresh the graph:

15

Analyzing Data

To calculate the metrics from the graph, go to the NodeXL tab and click on “Graph Metrics”:

16

Click on the “Select All” button and “calculate metrics”:

17

Understanding Metrics

In the Vertices sheet, scroll the right to display the metrics columns:

18

  • In degree: most trusted person (Edna, Bob). It is a count for the number of “arrows” that link to the person (people who trust the vertex).
  • Out degree: most trustful (Edna, Fred, Claire). It is a count for the number of “arrows” that go out from this person (how many people are trusted by the vertex).
  • Prestige is a metric you can choose to calculate yourself, it is the number of links to the person divided by the total number of possible links (in this example, the maximum would be 5 links or 5 people trusting the 6th person). A prestige of 1 means that “everyone trusts the vertex”

19

  • Betweeness centrality (Centralité d’intermédiarité) : A high number means that the person is in a central position in the graph. This metrics is based on the shortest paths between all people in the graph. It measures how important a node is by counting the number of shortest paths that it is a part of.

how many pairs of individuals would have to go through the vertex in order to reach one another in the minimum number of hops?

 

 

  • Closeness Centrality (Centralité de proximité) : who has faster access to information (in case of information diffusion). This number is an evaluation of the average distance to all nodes in the network.

What if it is not so important to have many direct friends? or be “between” others. Yet, one still wants to be in the “middle” of things, not too far from the center.

 

 

  • Eigenvector Centrality (Centralité vectorielle) : Eigenvector centrality calculates the position of a node to all other nodes in the network by giving a weight based on the distance (La centralité vectorielle essaie de mesurer l’importante ou l’influence d’un nœud dans le réseau en donnant un poids relatif à chaque lien du réseau. L’idée étant que la centralité d’un nœud est égale à la centralité des liens auxquels il est rattaché. à on tient compte de l’importance des liens auquel il est attaché). It is different from betweeness centrality in the sense that it considers all paths between nodes, not only the shortest one.

Eigenvector Centrality measures the importance of a node by the importance of its neighbors.

 

Eigenvector centrality is a measure of the influence of a node in a network. It assigns relative scores to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes

 

  • Centralité de Katz and page rank: Alors que le degré de centralité mesure le nombre de liens adjacents, la centralité de Katz mesure tous les nœuds qui peuvent être connectés à travers un chemin en pénalisant les nœuds distants.

 

Katz and Page Rank Centrality are a median solution between degree centrality (shortest paths) and eigenvector centrality (all paths) as they measure the number of all nodes that can be connected through a path, while the contributions of distant nodes are penalized.

 

  • Clustering coefficient : The clustering coefficient is a measure of an “all-my-friends-know-each-other” property. If the value of the Clustering Coefficient is equal to 1, this means that each of my friends is friend with all the others.

 

  • Reciprocated vertex pair ratio: reciprocated vertex pair ratio: ratio between ingoing and outgoing connections (only valid in directed graphs).

 

 

 

 

NetVizz to NodeXL converter

This is a small conversion tool I have developed to convert NetVizz files (GDF) to NodeXL files (GraphML).

 

This tool works on all versions of Windows and has been tested with the “personal network” friends list extraction option in NetVizz.

 

NetVizz is a free Facebook App located at https://apps.facebook.com/netvizz/. NetVizz extracts Facebook data from personal timelines and pages.

 

NodeXL is a free, open-source template for Microsoft® Excel® 2007, 2010 and 2013 that makes it easy to explore network graphs. It can be downloaded from nodexl.codeplex.com.

 

NetVizz (GDF) to NodeXL (GraphML) is freeware.

Current Version:

NetVizz2NodeXL 2.0 Setup file for Windows

– Extracts field titles from GDF file

– Now with a progress bar

Earlier Versions:

NetVizz2NodeXL 1.0 setup file for Windows