The Philosophical Underclass Network

by @raimondiand

I was working on network data recently and, on my way to try Gephi, a paper request notification flashed on my fb page. Usually I spend  ~15/2o min a day reading the group timeline to spot if any request is still pending. The Philosophical Underclass Network, for those of you outside the zoo of Facebook private groups is a great place where philosophers help each other having access to paywalled content. To quote:

A group for those of us who have less-than-adequate library access to journals and e-holdings to request them of the more privileged.

Some of the folks asked me how to analyse fb data, to have a general picture of the group topology and the relations between its members, especially when it comes to exchanging papers. To go into details of network theory is not the scope of this post.  So here’s a simple tutorial to work with (virtually any) set of network data.

Equipment: 1/2hour – Gephi – FB group data

First, let’s download the data. Go to NetVizz group data, insert 533403256770678 which is the group ID (you can retrieve someone ID by checking here) The app will start querying data, so be patient. When done, right click and download the file. Notice that data are anonymised, so no question about “wow cool, is the blue team the ancient philosophy team??” or “who is that guy” can be answered by this data package alone. Be sure you have .gdf as extension. Otherwise change extension.

Now open Gephi, open *project*, and at the top you will see 3 tabs. Click on the *datalab* and import the data through the *import* button. When data are uploaded you can switch between tabs. Alternatively, you can go into file/open and select the graph file. Go check the first network blob in the first tab. What you see are raw network data. Specifically, it’s the philosophical underclass network.

Second, we need to optimise the layout of the graph. Thus, select one algorithm across the class of Force-directed graph drawing algorithms. Among those available, today we run the Force Atlas. Stop the algorithm when you start observing separate clusters. Notice that, depending on the nature of the data, we can use several others algorithms.

Facebook data, for instance, constitute an undirected graph, that is, a graph in which connections have mutual directions. On Facebook, in order to interact, we have to mutually be friends. Contrast Twitter, where one can follow another, but this does not have to follow back necessarily. So for instance I prefer Fruchterman-Reingold on twitter data to exploit asymmetry between users.

It might be useful to optimise the layout for the degree of connections that each node have. In the ranking tab, choose *degree* from the drop-down menu. Use the bar and the triangle to set a gradient of colours. Then Apply. Now, at least, we are far away from the confusing blob we started with.

Before any further step we need to measure the average path length of nodes. So go on Statistics, click run on *average path length*, select *directed*. Close reports at the end.

Now we want a measure of the importance of a node in the network. This is called centrality. You can use Gephi to size the nodes by either eigenvector centrality or betweenness centrality.

The first measures the degrees of the nodes that a node is connected to. In the ranking panel select the red diamond and, from the drop menu, select *eigenvector centrality*. By applying you can see which are the nodes with the highest centrality. You can play with the range of degree.

Instead, I’ve used the latter, betweenness centrality, a measure of the influence of a node on the transfer of an item through the network. This choice comes with no surprise since we are analysing a crowdsourced group for retrieving papers. Again in the ranking panel select the red diamond and, from the drop menu, select *betweenness centrality*. By applying you can see which are the nodes with the highest influence. You can play with the range of degree.

Anyhow, you should see some big guys popping out. I was particularly interested in b-centrality because it shows one node’s ability to bridge different sub-network. And since the group exchange papers, I was interested in grouping by this criteria.

Don’t forget that if you want to have a more clear picture, go to layout tab and click on *Adjust by size*.

Now we have to find how the network is modulated, that is, how many sub communities are there. Is no surprise that Facebook data reveal community structure,  clustering is a property of such systems. Modularity is a measure of the degree in which the system manifested an organised structure.

First we have to create a modularity class value for the nodes, so that we can picture them. Go to Statistics and click run next to modularity. Now  we can run Gephi’s Modularity based community finding algorithm to group nodes. Go to *partition*, click on refresh arrow, choose modularity class from the dropdown and apply. You can play with colours as well.

What you see now is the network of phil-underclass, grouped by communities tied by high value centrality nodes. Now that you have the overall topology on the screen you can, before switching to preview and export, filter the graph, eliminate maybe minor nodes to emphasises the major structure. Go to Filters and click on *Topology*. Drag *degree range* in the bottom grey space. Click on *degree range* and adjust as you prefer. Remember to apply. Go to preview and sized the network in its final details.

The actual dimension of the network

Now you can export the network an share it with the rest of us. Hope you enjoyed the ride.

If you want to practice more, here I offer some ideas.

(1) The dataset comprise several other attributes of the group members. You can play with them as class of modularities.

(2) Go to NetViz, download the other dataset for this group, which is about members’ interaction, and visualise with Gephi.