R

From Biowerkzeug Wiki
Jump to navigationJump to search

R (also known as the R project) is a free software environment and language for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. R is a free implementation of the statistical programming language S and S-Plus developed at Bell labs on the 80-90s. It is object oriented and provides several tools for statistical analysis, however it can be used for data manipulation, calculation and graphical display as well.

Notably, it can be used from python using rpy which makes all R functions available to python.

Documentation

Essential reading

Additional reading

Installation

First obtain R from one of the CRAN mirrors. Or alternatively use one of the following links:

Windows and Mac users most likely want the precompiled binaries listed above, not the source code. The sources have to be compiled before you can use them. If you do not know what this means, you probably do not want to do it!

Then follow the instructions in:

Using R

Probably the coolest feature in R is its object oriented design: almost everything is an object in R. Hierarchy, operator/function overcharge and OOP in general is highly and elegantly used, thus if you are familiar with OO languages you'll have a lot of fun.

Package management

R is upgradeable via third-party packages. This packages can be updated/delete a la perl. That is, packages can be downloaded and installed from within R. First set the option CRAN to your nearest CRAN mirror using chooseCRANmirror(). Then download and install packages pkg1 and pkg2 by

    > install.packages(c("pkg1", "pkg2"))

The essential dependencies of the specified packages will also be fetched. Unless the library is specified (argument lib) the first library in the library search path is used: if this is not writable, R will ask the user (in an interactive session) if the default user library should be created, and if allowed to will install the packages there.

If you want to fetch a package and all those it depends on that are not already installed, use e.g.

    > install.packages("Rcmdr", dependencies = TRUE)

In case you want to compile packages yourself use

   > install.packages(c("pkg1",...), type="source")    

Removing packages is easy as well, from a running R process they can be removed by

    > remove.packages(c("pkg1", "pkg2"),
                      lib = file.path("path", "to", "library"))

The command update.packages() is the simplest way to ensure that all the packages on your system are up to date.

Data structures

There are several different structures in R namely:

Numbers : 1,2,3,5,8 ... 
Vectors : myvec <- c(1,2,3)
Matrices : x <- array(1:20, dim=c(4,5)) or  x <- matrix(c(1:20),4,5)
Lists : Lst <- list(name="Fred", wife="Mary", no.children=3,
                  child.ages=c(4,7,9))

Tables and Matrices are different structures and cannot be indistinctively used. Individual elements of an array, matrix or list can be accessed in a C-like way: Thus for the examples above the individual elements can be accessed as:

myvec[[2]] and is the number 2
x[3,2] and is the number 7
Lst$name is the same as Lst[[1]] and is the string "Fred",
Lst$wife is the same as Lst[[2]] and is the string "Mary",
Lst$child.ages[1] is the same as Lst[[4]][1] and is the number 4. 

Reading a file

Say you have a file ufsr.plot, which is a two column file then you might want to read it as

filename="ufsr.plot"
mytable <- read.table(filename,header=FALSE)
myvec1 <- mytable[[1]]
myvec2 <- mytable[[2]]

or alternatively

a=matrix(scan(filename,what=0),200,200,byrow=TRUE)


Examples

Emacs Speaks Statistics

For emacs junkies. This package provides highlighting and other cool interfaces to R (and other statistical analysis packages)