Scatterplot Matrices

Scatterplot matrices are a great way to roughly determine if you have a linear correlation between multiple variables. This is particularly helpful in pinpointing specific variables that might have similar correlations to your genomic or proteomic data. If you already have data with multiple variables, load it up as described here.

If not, no worries because R comes with some various presaved datasets for practice (some are more interesting than others. To view these datasets, input the following.

data()

For this tutorial, we will be looking at the datasets trees and ChickWeight.  First, load or open these datasets.

data(trees)
data(ChickWeight)

To see the actual data contained by these datasets, just write the title of the dataset.

trees
ChickWeight
  • The trees dataset seems to contain three columns of measurements: Girth, Height and Volume.
  • The ChickWeight dataset seems to involve little chicklets getting fed different diets and being weighed at various time points.

To find out more information about the datasets and to confirm our observations, put a question mark before the title of the dataset.

?trees
?dataset

Now, you ready for the scatterplot?

pairs(trees)
trees

Dataset Trees Scatterplot Matrix

This is an example of a scatterplot matrix. The variables are written in a diagonal line from top left to bottom right. Then each variable is plotted against each other. For example, the middle square in the first column is an individual scatterplot of Girth and Height, with Girth as the X-axis and Height as the Y-axis. This same plot is replicated in the middle of the top row. In essence, the boxes on the upper right hand side of the whole scatterplot are mirror images of the plots on the lower left hand.

In this scatterplot, it is probably safe to say that there is a correlation between Girth and Volume (Go data! Confirming the obvious) because the plot looks like a line. There is probably less of a correlation between Height and Girth in addition to Height and Volume. More statistical analyses would be needed to confirm or deny this.

Now for ChickWeight.

pairs(ChickWeight)
chickweight

Dataset ChickWeight Scatterplot

This scatterplot matrix is unfortunately not as clean as the last plot because it contains discrete data points for Time, Chick and Diet. However, much can still be extracted from this scatterplot matrix (think about BS exercises you might have done for English or Art) about experimental design and possible outcomes.

  • Scatterplots related to Time are evenly distributed into columns or rows, suggesting that data was actually collected in a regimented fashion. (As in, data was collected at the times it should have been for all the Chick samples).
  • There were about 50 chicks. The first 20 were on diet 1 and then the next three groups of 10 were given diet 2, 3 or 4.
  • Looking at Row 4, Column 1, there is a possibility that chicks on diet 3 gained more weight than chicks on diets 1, 2 or 4.
  • Looking at Row 2, Column 1, it seems that chicks weighed about the same amount at the beginning of the experiment but variation increased as time passed on. In general, there is an increase in weight.

There you have it!

In conclusion,

  • Scatterplot matrices are good for determining rough linear correlations of metadata that contain continuous variables.
  • Scatterplot matrices are not so good for looking at discrete variables.
Advertisements

Soup up your R environment: how to install packages

Today we are going to make additions to our R environment in a common process called installing packages. The transition won’t be as long, drastic nor emotional as an episode of Extreme Makeover: Home Edition, but it does add on more capabilities to your R environment.

Wow...I would like to live in a house like this

Extreme Makeover: Home Edition in Houston as published by Houston’s Real Estate Landscape Swamplot

A package is a bunch of codes combined and distributed through the different R mirrors/servers. Packages usually serve a specific function such as analyzing certain types of data or assisting in multi-server computing. Today, we are going to install ggplot2 and gplots, both of which are commonly used in creating different figures.

I will briefly enumerate the installation steps here which fall roughly into two parts. If you would like screenshots of the process in addition to the instructions, click here.

Method 1 (less typing)

Part 1-Getting the Package onto Your Computer

  1. Open R via  your preferred method (icon on desktop, Start Menu, dock, etc.)
  2. Click “Packages” in the top menu then click “Install package(s)”. 
  3. Choose a mirror that is closest to your geographical location.
  4. Now you get to choose which packages you want to install. If you would like to install multiple packages, click on each one while holding the CTRL key or the cloverleaf-looking key. For now just highlight “ggplot2”.
  5. You will know when the package has been downloaded onto your computer when another greater-than symbol (“>”) appears.

Part 2-Loading the Package into R

  1. Type “library(ggplot2)” and then press the Enter/Return key.
  2. All done.

You will only need to do Part 1 once time on your computer. From now on, you only need to do Part 2 each time you close and restart R.

Method 2 (Quicker)

Use this method once you get more acquainted with the whole copy and paste business. It is much quicker than Method 1.

Part 1-Getting the Package onto Your Computer

  1. Type “install.packages(“gplots”)” and then press the Enter/Return key.
  2. If you have already loaded a package from a server in the R session, then R will automatically install the package. If not, R will automatically prompt you to choose a mirror. Again, choose one close to unless you want to watch a loading bar slowly inch its way to fulfillment.

Part 2-Loading the Package into R

  1. Type “library(gplots)” and then press the Enter/Return key.
  2. All done. R will spit out a lot more output because it needs to install other packages required for gplots.

Again, you will only need to do Part 1 one time on your computer. From now on, you only need to do Part 2 each time you close and restart R.

I thought R was a letter…intro/installation

Image

I will make a confession. This past summer, I didn’t spend my spare time watching relentlessly addicting TV shows nor clubbing in San Francisco. Instead, I checked out figures. No, not the sort of figures you’re probably thinking about. The ones that are included in research papers and have the potential to be beautiful works of art (such as the ones found on the Figured Foundation). And you guessed it, R can be used to generate awesome figures for your last minute put together presentation or end-of-term paper.

Oh yeah. That figure!

Some examples of R graphs found on the R website.

So what the heck is R?

R is a free “environment” that can be used to run statistical analysis and create amazing graphics (a more comprehensive list of R’s superpowers can be found here). It can be installed on Unix, Windows and Apple operating systems. It is considered an “environment” because there tons of add-ons that you can load or write to expand the capabilities of R. In essence, R is a free sports car that you can add rocket power and flying capabilities while still looking cool.

If this is your first time using a form of code, don’t be afraid. Once you get over the hump that R (or any code in general) is relentless about spelling, the process of entering code is no different from choosing the different components of a sundae (do we want strawberries or blueberries as the fruit topping?). And once you get a little more involved, you can start improvising your own crazy creations via the writing of your own functions. This past summer I met a guy that used R to scourge the Travelocity, Priceline, and other airplane ticket finding websites to search for his ideal plane tickets.

So how do I install this?

My apologies ahead of time, but this will be a Windows-centric tutorial. If you would like step by step instructions with screenshots, click here.

  1. Go to the R website and click “Download R” under “Getting Started”
  2. Choose a place to download R. Even though we’re on the limitless and borderless interweb, choosing a location close to you helps speeds things up.
  3. Choose which R package to download based on your operating system in the first box. If you are Unix or Mac user, I apologize buut this is where we now go our separate ways.
  4.  Click on “install R for the first time” then download the file with the biggest font on the top. Then open that puppy up.
  5. Windows might be paranoid and say the publisher could not be verified. Click “run”. Then choose your language.
  6. Click “next” to start the installation, agree to all their legal writings, and selection an installation window.
  7. Select “Core Files” and then either 32-bit or 64-bit files depending on your computer system. (To check, hit Start, right click Computer and select Properties. Look at System Type).
  8. Now you have a choice for Startup Options. I prefer to view the program in multiple separate windows so that I can arrange them on my screen while also have an internet browser or a notepad type program open as well.
    multiple2If you like what you see in the photo above, click “Yes (customized setup)”. If you prefer to have one window with all the components of the program viewed inside that window click “No (accept defaults)” and skip to Step 11.
  9. If you said yes to Step 8, click “SDI (separate windows)”. Next, you can specify plain text or HTML help. I would suggest HTML help because it is easier to view than plain text, which appears in the window.
  10. If you are at an institution that utilizes Internet2.dll, select “Internet 2.” If not or if you are unsure, select “Standard”.
  11. Go ahead and create a program shortcut by clicking “Next“.
  12. Choose if you want to have another icon clutter your desktop and/or Quick Launch toolbar. I suggest leaving the two options under “Registry Entries” selected.
  13. Let it do its thing. Go on Facebook, write a Tweet or run to the bathroom really quick.
  14. Things should be all done! Go update your status or Tweet how excited you are to have installed R.