Analyse Online Networks with VOSON Lab Tools

Analysing online networks with VOSON Lab tools is an online tutorial that we held in March 2022. Here we share the webinar video with you. We have also added the questions that were asked during the live session and their responses. If you have a question, please send it through using the form below, and we will follow up with a response and any other resources.

This session provided an overview of methods and research used to study online networks of political discussion on social media (Twitter, Hyperlinks, Reddit), using data collected with the VOSON Lab suite of open-source R tools: vosonSML, VOSON Dashboard and voson.tcn. A live demo of VOSONDash, an interactive R Shiny web application for the collection (via vosonSML), visualisation and analysis of social media network data, was also presented.

About the Tool

The VOSON Lab Virtual Observatory for the Study of Online Networks is located in the Research School of Social Sciences at the Australian National University. We are advancing the Social Science of the Internet through an innovative programme of research, research tool development, and teaching and training. The VOSON tools have been publicly available since 2006. The current R tools are available on CRAN and GitHub, with over 61K downloads to date, and are downloaded over 1K times per month. 

About the Speakers

Prof Robert Ackland - VOSON Lab School of Sociology and ANU Centre for Social Research and Methods, Australian National University. 

Robert works at the intersection of empirical social science and computer science, developing new approaches (involving information retrieval, data visualisation and social network analysis) for studying networks on the World Wide Web. He has been a chief investigator on five Australian Research Council grants and under a 2005 ARC Special Research Initiative (e-Research Support) grant, he established the Virtual Observatory for the Study of Online Networks project. Robert has co-organised symposia focusing on e-Social Science (2004) and the social impact of nanotechnology (2006) and in 2007, he spent six months at the Oxford Internet Institute under a UK National Centre for e-Social Science Visiting Fellowship and a University of Oxford James Martin Visiting Fellowship. 

Robert has degrees in economics from the University of Melbourne, Yale University (where he was a Fulbright Scholar) and the ANU, where he completed his PhD in economics (on index number theory and international comparisons of income) in 2001. Prior to commencing his PhD, Robert gained extensive experience in applied economic and statistical analysis in the government and non-government sectors. From 1991-1993, he worked as a senior researcher in the Bureau of Immigration Research (Commonwealth Department of Immigration). He worked as a World Bank consultant (based in Washington DC, 1995-1997) in the area of poverty analysis and has also consulted on AusAID and Asian Development Bank projects in this area. Robert teaches courses on the social science of the Internet and online research method in the Master of Social Research and his book Web Social Science: Concepts, Data and Tools for Social Scientists in the Digital Age was published by SAGE in 2013.

Francisca Borquez - Research Assistant VOSON Lab 

Francisca graduated from ANU Master of Social Research in 2010. Francisca has been involved in various research related work for the last 11 years in both academia and industry, with a focus on Social Network Analysis (SNA), computational social science as well as quantitative and qualitative methods. As part of the VOSON Lab, she has assisted in diverse research projects and has collaborated with open-source software developed at the lab. Her research interests are online social and organisational networks, online behaviour, computational methods and experimental social research.   

Bryan Gertzel - Research Programmer VOSON Lab 

Bryan is an Information technologist with interests in the Internet, cyber security and online social networks. He graduated from the ANU Master of Social Research in 2012 and is a Research Programmer for the VOSON Lab. Bryan is the main developer and maintainer of the VOSON Lab suite of tools. He has collaborated in large-scale data collection projects and is involved in the research project Unbiased Bots That Build Bridges (U3B): Technical Systems that Support Deliberation and Diversity as a Chance for Political Discourse, led by the University of Bielefeld, Germany.

Additional Resources

GitHub

Code Blog

vosonSML: Documentation - GitHub page and vosonSML vignette

VOSONDash: Documentation - GitHub page and VOSONDash Userguide

voson.tcn: Guide for Collecting and Constructing Twitter Conversation Networks

Teaching and Training

•ANU undergraduate and masters courses in Online Research Methods, Social Science of the Internet, Economic Analysis of the Digital Economy

•PhD studies

•Online short courses and masterclasses via ACSPRI

References

Analyzing Social Networks Using R, by Stephen P. Borgatti, et al. here.

Web Social Science, by Robert Ackland, here.


Q&A

+ Are data able to be archived and reused? Or is the data collected dynamically as part of the analysis?

Collected data can be downloaded as an R data frame to RDS file or exported from the data tables, for example as CSV. Networks can be downloaded as data frames of nodes and edges, or in GraphML format (that can be imported later). The current state of a network graph from the analysis section can be downloaded as GraphML and then imported later via the open GraphML control. Data collection is performed prior to analysis, however the VOSONDash interface makes it easy to iteratively collect and examine data as part of exploratory analysis.

+ Best practice in using Reddit datasets, especially API data and Pushshift.io data

vosonSML retrieves JSON data from subreddit threads using unauthenticated requests. The data retrieved is public, however this method is very limited and may be removed by Reddit at any time. Best practice would be to use another library that supports authenticated access to the API until we support this in vosonSML. We hope to support an authenticated API approach with a more comprehensive, standardised data collection and network creation in the future. It is possible to generate networks from JSON data retrieved from Pushshift.io: this can be done using igraph in R and we are planning a blogpost on this topic.

+ In what ways is VOSON helpful in literary studies and research?

VOSON is designed to enable research into online networks. The main reason for using VOSON is if you are interested in understanding how actors are interacting with one another via e.g., replies or retweets on Twitter, or comments on Reddit, and where it is useful from a research perspective to use network analysis to study this behaviour. Another reason for using VOSON is if you are interested in collecting and analysing text data from social media, and where you would like to know the actors who are authoring the text, and how these actors connect with one another. If you are not interested in networks, then here are other complementary open-source tools available for text analysis within the R environment, such as Quanteda and tidytext, which are used for the quantitative analysis of text data.

+ Cost of VOSON? Is there a free option?

VOSON R tools are Free and Open-Source Software (FOSS) released under the GPL-3 licence. They are publicly available via The Comprehensive R Archive Network (CRAN) and from VOSON Lab GitHub repositories.

+ What's the difference between the VOSON tools and other social network analysis tools (e.g., NodeXL)?

The VOSON tools are released as open-source R packages and hence they make use of, and can be used in addition to, other packages within the R environment such as rtweet, igraph, statnet, visnetwork, quanteda, tidytext etc. VOSON is complementary to the R packages igraph and statnet and indeed, VOSON makes extensive use of igraph for network analysis functionality. But igraph and statnet do not enable data collection from social media or the web (that is VOSON’s speciality). While we think NodeXL is great software (and indeed there used to be a VOSON plugin to NodeXL, for hyperlink network collection), some users may find it a limitation that NodeXL only runs in Windows. R works on the major operating systems: Windows, MacOS, Linux. We would like to draw your attention to Gephi for large-scale network visualisation. It is possible to create a network in VOSONDash and then export it to graphml and import it into Gephi for visualisation. Why would you do this, and not simply make use of the visualisation capabilities of VOSONDash? Well, while we are very proud of network visualisation in VOSONDash (and we build on igraph and visnetwork for this), the fact is that since VOSONDash is a web application, it is not capable of visualising very large networks. Gephi is the specialist tool for network visualisation, and we use it extensively in the VOSON Lab. Finally, we’d like to mention two other software tools that are very prominent in social network analysis (SNA): UCINET and Pajek. Again, these tools do not provide functionality for collecting network and text data from social media, but it is possible to use UCINET and Pajek to analyse networks created in VOSON.

+ Is there a possibility to filter the data related to a profile of users, for example concerning their age or location?

The fields available for analysis are those provided by the APIs, and they differ for each data source. After you have collected your data, you will be able to see what fields are available in the data table. For Twitter, for example, there are around 80 fields that are available including profile information (such as location, if the user provided it). Note that by default, only a subset of available fields is included in the network as node or edge attributes. It is possible to include additional fields as node/edge attributes, but that will require some simple R/igraph coding.

+ Can this software be used for analysing social media contents other than political discussion?

Yes. In the VOSON Lab we tend to focus on analysis of political discussion, but it is possible to use VOSON tools to study any public social media activity. By “public” we mean that users have not changed their privacy settings such that their behaviour is hidden (such private activity is not available for collection via APIs). So VOSON can be used studying activity on social media related to any topic that you are interested in, as long as there are people on social media who are talking about it.

+ How reliable is data collection online?

VOSON will collect whatever the APIs allow it to collect. APIs have well-known restrictions or limitations. For example, with the Twitter API there might be limitations associated with sampling of data (when you are collecting on a hashtag that has high volume), and collection of historical Twitter data is only available if you have Academic Track access to the API. If a user deletes their tweets or Twitter suspends a user account, then the data will no longer be available via the API. Another issue of “reliability” of social media data is: how representative is the data of the population you are interested in studying? Social media data are typically not representative of the general public. Appropriate research design can help address such restrictions and limitations. For example, in our analysis of the 2020 US presidential debate Twitter data we do not make claims about what the US voting public thought about the candidates, rather our population of interest is people who were on Twitter talking about the debates.

+ Does text analysis (sentiment analysis) in Dash work in different languages or just in English?

We have spent a lot of time to ensure that the VOSON tools collect and store text data in an appropriate manner. So, for example, if the Twitter data you collect contains non-ASCII characters (e.g., Chinese language) then the text data will be stored correctly for further analysis. However, the VOSONDash text analysis tools (frequency analysis, word clouds, sentiment analysis) will possibly not handle the text correctly. With regard to the frequency analysis and word clouds, the approach we use relies on using spaces for tokenisation of words, and that is not appropriate for all languages. The sentiment analysis in VOSONDash is using an English lexicon. So, our recommendation is that if you are wanting to conduct text analysis for a language other than English, then you are probably best using VOSON just for the data collection and network construction. Then you can export your data (including networks, if useful to you) and analyse your text using R packages that are designed for handling the language you are working with. That is the beauty of working in the R environment: there is almost certainly going to be an R package to help you.

+ In which format can you download the data?

The raw data (what VOSON collects from the APIs) can be downloaded as a data frame (rds format for storing R objects). Network data can also be downloaded as a data frame, csv or Excel format. Network graphs can be downloaded as GraphML files.

+ Is there a way to override the API restrictions via brute force scraping?

If you are wanting to scrape a social media platform, then VOSON is not the tool for you. VOSON allows you to collect via APIs (for Twitter, Reddit, and YouTube) and hence whatever the API will allow you to collect, then you can collect it via VOSON. By supporting collection via APIs (rather than web scraping) we contend that VOSON is a tool for ethical research into online behaviour. However, there will still be ethical considerations with what you do with the data collected using VOSON, and this will be something that the human research ethics committee at your university will have something to say about. Finally, while we do provide a web crawler within vosonSML (WWW hyperlink networks are one of the data sources that you can collect on), it is designed to crawl websites, not social media platforms, and further: it obeys the robots.txt protocol, so it only collects the data that webmasters are making visible to crawlers.

+ Could you please share some sample research articles, which employed the VOSON app?

There are some research examples in the slides we are providing as part of this webinar. Also, please see the VOSON Lab website for nearly twenty years of research using and producing the VOSON tools. If you want to find research by other people where VOSON tools are used, sometimes authors forget to cite us, but google searching for VOSON can turn up some papers.

+ Can you do search query without the hashtag? Can we search for certain words in the tweet, for example?

Yes, it is possible to search a tweet for any term (a word, a hashtag) or a combination of terms (including boolean searches). Twitter allows for sophisticated search queries (see standard search operators) https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators and so whatever the API allows, you can do this in VOSON. Additionally, the collection may be filtered by, for example, type of Twitter activity (e.g., to include retweets only), number of collected tweets, or language of tweet. See our vignette for more information: https://cran.r-project.org/web/packages/vosonSML/vignettes/Intro-to-vosonSML.html

+ What is the maximum number of tweets you can collect in a network?

It depends on the Twitter API rate limits. With the standard v1.1 Twitter API, there is a limitation of 18,000 collected in a 15-minute period. If your collection is going to exceed this rate limit, it is possible to set VOSON so that the collection will pause or sleep if the limit is reached, and then it will automatically start up again. The VOSON Lab conducted large-scale Twitter collections (over 1 million tweets collected) during the debates of the 2020 U.S. presidential election. For details, see this blog post: https://vosonlab.github.io/posts/2021-06-03-us-presidential-debates-2020-twitter-collection/

+ Can you look at changes over time? Is it possible to build an author network scraping data in reddit based on date? For example, from day x to day y?

For three of the data sources (Twitter, Reddit, and YouTube) there is timestamp information indicating when a tweet was authored, or when a comment on Reddit or YouTube was written. VOSON includes the timestamp data in the network as a node or edge attribute. Hence, it is possible to conduct dynamic network analysis. Also, it is quite common to undertake Twitter collections over a period of time e.g., collecting on a particular hashtag every week. This leads to a series of dynamic networks which can be analysed separately or merged into a single large dynamic network. We are currently exploring ways to integrate dynamic network analysis and visualisation into VOSONDash, but the data for dynamic network analysis are being collected and are available. It is currently not possible to use VOSON to collect comments that were authored during a particular time period. What you would need to do is collect the entire thread (or post) and then you can later filter out comments based on date of creation (this would require that you download the data and work with it directly in R).

+ What types of training/ workshops do you offer? For researchers and educators?

  • The VOSON Lab contributes to undergraduate and master’s courses at the Australian National University in the following areas: Online Research Methods, Social Science of the Internet, Economic Analysis of the Digital Economy.
  • We encourage applications from suitably qualified students to undertake PhD studies in the School of Sociology, where the VOSON Lab is located.
  • We run online short courses and masterclasses via the Australian Consortium for Social and Political Research Inc. (ACSPRI).

+ Currently VOSON searches Twitter, YouTube, and Reddit. Is one able to search Facebook, Instagram, LinkedIn, and Tik Tok? Might this be possible in the future?

If a social media platform affords networked behaviour (e.g., conversations, commenting, liking of posts, sharing of posts) and has a publicly available API, then the VOSON Lab might be interested and available to extend the VOSON tools to collect the data. In the past VOSON was able to collect data from both Facebook and Instagram, but the changes to the API that Facebook enacted after the Cambridge Analytica data scandal meant that it was no longer possible to collect network data from these platforms. We are always looking to integrate other data sources via their APIs, that can be used for social network analysis, but please remember we are a small team so we might need to seek resourcing for any major software development.

+ Is VOSON tool able to crawl the hyperlink and content of a website/page?

Yes. Hyperlink collection is available via vosonSML. See the following blogpost: https://vosonlab.github.io/posts/2021-03-15-hyperlink-networks-with-vosonsml/

+ Is this only for user networks? Or can I use this for co-hashtag network visualization?

The VOSON software designed for the analysis of networks, and the software currently produces the following networks:

  • Reddit: actor network (nodes are Reddit users who have commented, and the author of the post); activity network (nodes are the comments, and the top-level post).
  • YouTube: actor network (nodes are users who have commented on a YouTube video, and the channel that uploaded the video); activity network (nodes are the comments, and video).
  • Twitter: actor network (nodes are Twitter users who have e.g. authored tweets containing a hashtag or are mentioned/replied to/retweeted in tweets containing a particular hashtag); activity network (nodes are the tweets); two-mode network (two actor types – user and hashtag – and there is an edge from user i to hashtag j if user i authored a tweet containing hashtag j; semantic networks (nodes are entities extracted from the tweet text - words, hashtags and usernames and edges reflect co-occurrence i.e. there is an edge between entities i and j if they both occurred in the same tweet).
  • WWW hyperlink: actor network (nodes are website domains e.g., www.anu.edu.au); activity network (nodes are web pages).

More details on the network types can be found in our vignette: https://cran.r-project.org/web/packages/vosonSML/vignettes/Intro-to-vosonSML.html

But remember: if there is a particular network type that you wish to work with, and it is not currently provided by VOSON, then it is always possible to export the VOSON network as graphml and then use igraph in R to construct whatever network type you would like. We do this in the VOSON Lab, and we are planning future blogposts on this topic.


Explore How To Webinar Series

 
Previous
Previous

Assimilate Literature with Scholarcy

Next
Next

Qualitative Analysis with Quirkos