The purpose of this report is to investigate current solutions for creating web interactive data visualisations in R before designing a more flexible approach for customising interactions onto plots.

1.1 The need for interactive graphics

Interactive graphics have become popular in helping users explore data freely and explaining topics to a wider audience. As Murray (2013) suggests, static visualisations can only ‘offer pre-composed views of data’, whereas interactive plots can provide us with different perspectives. Being able to interact with a plot allows us to explore data, discover trends and relationships that cannot be seen with a static graph. The importance of using interactive graphics to enhance exploratory data analysis is evident across different fields, from identifying missing values and pinpointing outliers to cluster analysis and classification problems (Cook and Swayne 2007).

The term “interactive graphics” can have different meanings. Theus (1996) and Unwin (1999) (as cited in Unwin, Theus, and Hofmann (2006)) have suggested that there are 3 broad components: querying, selection and linking, and varying plot characteristics. Querying involves finding out more about features that the user may be interested in, selection and linking involves subsetting a certain group and linking to different displays of the same data set, while varying plot characteristics involve changing parts of the plot to get more information which could include “rescaling, zooming, reordering and reshading” (Unwin, Theus, and Hofmann 2006). Together, these encapsulate the concept of interactive graphics with a certain goal of informing the user more about patterns and relationships that they may be interested in.

In this work, we will also make a distinction between interactions that are done directly on to the plot and those that are controlled by an indirect component. We can refer this to on-plot and off-plot interactivity. On-plot interactivity refers to when a user can interact directly using capabilities that are incorporated within the plot itself to query, select and explore the data. These include clicking and creating drag-boxes to select components of a plot. Off-plot interactivity refers to interactions that are driven from outside of the plot, such as a slider to control certain plot characteristics and using dropdown menus to filter and select groups.

R (Ihaka and Gentleman 1996) is a powerful open source tool for generating flexible static graphics. However, it is not focused on interactivity. Previously, there have been different programs to help create interactive plots to aid analysis including ggobi (Cook and Swayne 2007), iplots (Urbanek and Wichtrey 2013), and Mondrian (Theus 2002). Despite their capabilities, all these require installation of software which makes it difficult to share and reproduce results. More recently, new visualisation tools have begun to use the web browser to render plots and drive interactivity.

1.2 The web and its main technologies

The web is an ideal platform for communicating and exchanging information in the present day. It has become accessible to everyone without the worries of device compatibility and installation. Web interactive visualisations are becoming more commonly used in areas including data journalism, informative dashboards for business analytics and decision making, and education. They will be increasingly in demand in the future.

The main web technologies are HTML, CSS and JavaScript. Hyper Text Markup Language (known as HTML) is the language used to describe content on a webpage and cascading style sheets (known as CSS) is the language that controls how elements look and are presented on a web page such as colour, shape, strokes and fills, borders (W3C 2016). These can be used to define how specific types of elements are rendered on the page. JavaScript is the main programming language for the web (Crockford 2008), which is used to add interactivity to web pages. Whenever we interact with a website that has a button to click on or hover over text, these are driven by JavaScript.

The Document Object Model (known as the DOM) is the ‘programming interface for HTML and XML documents’ (W3C 2009). A single web page can be considered as a document made up of nodes and objects with a certain structure. We can use the DOM to refer to specific elements, attributes and nodes on the page that we wish to modify. We can use JavaScript and the DOM to create and change a dynamic web page.

Application programming interfaces (APIs) are defined as a set of tools that help developers connect and build applications (Jacobson, Brail, and Woods 2011). For example, when we see a map from Google Maps embedded in a web page, that web page is calling the GoogleMaps API to provide the map. Through the context of this report, APIs generally refer to JavaScript libraries that are called upon and used to render plots.

Many interactive visuals on the web are generally rendered using Scalable Vector Graphics (known as SVG). This XML based format is widely used because it is easy to attach events and interactions to certain elements and sub-components through the DOM. This cannot be done with a raster image such as a PNG or JPEG format, as a raster image is treated as an entire element.

1.3 Project motivation

The main motivation for this project arises from whether there are more intuitive ways to generate simple interactive visuals from R without the user having to learn many tools. One of many downstream users would like to advance features in iNZight (Elliott and Kuper 2017), a data visualisation software from the University of Auckland.

The approach taken by this dissertation is first to identify and assess the limitations of existing tools for creating web interactive visuals in R (Chapters 2 and 3). Key limitations that were found were (1) a tendency to reproduce entire plots, (2) the inability to customise interactions and add certain layers to a plot and (3) a need for learning web technologies and respective APIs. This led us to a design and prototype a viable solution (discussed in Chapters 4 and 5) that could potentially solve these limitations and the overall problem.

References

Cook, Dianne, and Deborah F. Swayne. 2007. Interactive and Dynamic Graphics for Data Analysis - with R and Ggobi. Use R. Springer. doi:10.1007/978-0-387-71762-3.

Crockford, Douglas. 2008. JavaScript: The Good Parts. O’Reilly Media, Inc.

Elliott, Tom, and Marco Kuper. 2017. INZight: INZight Gui for Data Exploration and Visualisation. https://www.stat.auckland.ac.nz/~wild/iNZight/index.php.

Ihaka, Ross, and Robert Gentleman. 1996. “R: A Language for Data Analysis and Graphics.” Journal of Computational and Graphical Statistics 5 (3): 299–314.

Jacobson, Daniel, Greg Brail, and Dan Woods. 2011. APIs: A Strategy Guide. O’Reilly Media, Inc.

Murray, Scott. 2013. Interactive Data Visualization for the Web. O’Reilly Media, Inc.

Theus, Martin. 2002. “Interactive Data Visualization Using Mondrian.” Journal of Statistical Software, Articles 7 (11): 1–9. doi:10.18637/jss.v007.i11.

Unwin, Antony, Martin Theus, and Heike Hofmann. 2006. Graphics of Large Datasets: Visualizing a Million (Statistics and Computing). Secaucus, NJ, USA: Springer-Verlag New York, Inc.

Urbanek, Simon, and Tobias Wichtrey. 2013. Iplots: IPlots - Interactive Graphics for R. https://CRAN.R-project.org/package=iplots.

W3C. 2009. Document Object Model (Dom). https://www.w3.org/DOM/.

———. 2016. HTML and Css. https://www.w3.org/standards/webdesign/htmlcss.