Posts Tagged ‘rstudio’

# .R script showing capabilities of sparklyr R package
# Prerequisites before running this R script: 
# Ubuntu 16.04 LTS 64-bit, r-base (version 3.3.3 or newer), RStudio 64-bit version
install.packages("sparklyr")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("tidyr")
library(sparklyr)
library(dplyr)
library(ggplot2)
library(tidyr)
set.seed(100)
# sparklyr cheat sheet: https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/sparklyr.pdf
# dplyr+tidyr: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
# sparklyr currently (2017-04-22) only supports Spark version: 2.0.1 or 2.0.2 (not 2.1.0!!)
# Install Spark locally:
spark_install("2.0.2")
config <- spark_config()
# number of CPU cores to use:
config$spark.executor.cores <- 6
config$spark.executor.memory <- "4G"
# Connect to local version:
sc <- spark_connect (master = "local",
 config = config, version = "2.0.2")
# Copy data to Spark memory:
import_iris <- copy_to(sc, iris, "spark_iris", overwrite = TRUE) 
# partition data:
partition_iris <- sdf_partition(import_iris,training=0.5, testing=0.5) 
# Create a hive metadata for each partition:
sdf_register(partition_iris,c("spark_iris_training","spark_iris_test")) 
# Create reference to training data in Spark table
tidy_iris <- tbl(sc,"spark_iris_training") %>% select(Species, Petal_Length, Petal_Width) 
# Spark ML Decision Tree Model
model_iris <- tidy_iris %>% ml_decision_tree(response="Species", features=c("Petal_Length","Petal_Width")) 
# Create reference to test data in Spark table
test_iris <- tbl(sc,"spark_iris_test") 
# Bring predictions data back into R memory for plotting:
pred_iris <- sdf_predict(model_iris, test_iris) %>% collect
pred_iris %>%
inner_join(data.frame(prediction=0:2,
lab=model_iris$model.parameters$labels)) %>%
ggplot(aes(Petal_Length, Petal_Width, col=lab)) +
geom_point() 
spark_disconnect(sc)
# Reproducible research .R script to run in RStudio in Ubuntu 14.04 LTS 64-bit
# Prerequisites to install: 
# https://mark911.wordpress.com/2014/11/06/how-to-install-newest-version-of-r-and-rstudio-in-ubuntu-14-04-lts-using-a-bash-script/
# Further prerequisites to install in R or RStudio:
install.packages(c("Quandl", "dplyr", "ggvis", "lubridate"))
# Data set: 
# https://www.quandl.com/data/ODA/MOZ_PPPSH-Mozambique-Share-of-World-GDP-based-on-PPP
library(Quandl)
library(dplyr)
library(ggvis)
library(lubridate)
data <- Quandl("ODA/MOZ_PPPSH", authcode="FiHHoC-Gnx3CHzr9385J")
str(data)
dplyr::glimpse(data)
head(data)
tail(data)
data$year <- lubridate::year(data$Date)
min_year <- min(data$year)
max_year <- max(data$year)
# source: https://github.com/rstudio/ggvis/blob/master/vignettes/ggvis-basics.Rmd
# source: Data Wrangling Cheat Sheet:
# http://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
# source: http://ggvis.rstudio.com/0.1/quick-examples.html
data %>% 
 ggvis(~Date,~Value) %>% 
 layer_points() %>% 
 layer_model_predictions(model = input_select(
 c("loess" = "loess",
 "lm" = "lm",
 "MASS::rlm" = "MASS::rlm"),
 label = "model")) %>%
 layer_smooths(se = TRUE,
 span = input_slider(min = 0.3, max = 1, value = 0.8, step = 0.1,
 label = "Smoothing span")) %>%
 add_axis("x", title = "Date") %>%
 add_axis("y", title = "Mozambique Share of World GDP based on PPP, %")
# install R 
sudo DEBIAN_FRONTEND=noninteractive add-apt-repository ppa:marutter/rrutter
sudo DEBIAN_FRONTEND=noninteractive add-apt-repository ppa:marutter/c2d4u
sudo DEBIAN_FRONTEND=noninteractive apt-get update
sudo DEBIAN_FRONTEND=noninteractive apt-get --yes --force-yes install r-base-core r-base
# install RStudio :
# Free disk space required: around 5 GB
# Mac OS X users should use RStudio instead of R to avoid the following UNIX child process forking error:
# THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY_YOU_MUST_EXEC__() to debug.
MACHINE_TYPE=`uname -m`
cd /tmp
rm rstudio*.deb
rm index.html
if [ ${MACHINE_TYPE} == 'x86_64' ]; then
 # 64-bit stuff here
sudo DEBIAN_FRONTEND=noninteractive apt-get --yes --force-yes install gdebi-core pandoc libssl0.9.8 libapparmor1
wget --no-check-certificate http://www.rstudio.com/products/rstudio/download/
wget --no-check-certificate `cat index.html|grep -v tar|grep amd64\.deb|cut -d"\"" -f2`
sudo dpkg -i rstudio*.deb
sudo DEBIAN_FRONTEND=noninteractive apt-get --yes --force-yes -f install
else
 # 32-bit stuff here
sudo DEBIAN_FRONTEND=noninteractive apt-get --yes --force-yes install gdebi-core pandoc libssl0.9.8 libapparmor1
wget --no-check-certificate http://www.rstudio.com/products/rstudio/download/
wget --no-check-certificate `cat index.html|grep -v tar|grep i386\.deb|cut -d"\"" -f2`
sudo dpkg -i rstudio*.deb
sudo DEBIAN_FRONTEND=noninteractive apt-get --yes --force-yes -f install
fi
cd $HOME
# troubleshooting information to check the rstudio installation:
uname -m
file /usr/lib/rstudio/bin/rstudio
ldd `which rstudio`

Prerequisites for all platforms (Windows, Mac OSX and GNU/Linux):

First make sure that R version 3.1.1 is installed.

Prerequisites for Windows:

Install the curl binary for Windows before importing List_P_3D_data.csv.

Windows 64-bit users should run the following .cmd script as administrator:
https://courses.edx.org/courses/KIx/KIexploRx/3T2014/discussion/forum/i4x-kiX-KIexploRx-course-2014_Practicalities/threads/5451f72635c79c749c000906

The previous .cmd script takes care of installing curl,R,RStudio and other applications in Windows. Or you can download curl here:

http://www.confusedbycode.com/curl/

I successfully installed and tested the import using this Windows curl binary:

http://www.confusedbycode.com/curl/curl-7.38.0-win64.msi

Make sure to close RGui, close RStudio and then restart RGui or Rstudio before proceeding with the next steps.

Prerequisites for Fedora 20:

Run following Terminal commands before importing List_P_3D_data.csv:

sudo yum update
sudo yum install curl curl-devel

Prerequisites for Ubuntu 14.04 LTS/Linux Mint:

Run following Terminal commands before importing List_P_3D_data.csv:

sudo apt-get update
sudo apt-get install libcurl4-openssl-dev r-cran-rcurl curl

Then upgrade to newest version of curl using this procedure:

https://mark911.wordpress.com/2015/09/27/how-to-compile-and-install-newest-version-of-curl-from-github-in-ubuntu-14-04-lts-64-bit/

Then run the following commands in R or RStudio:

install.packages("RCurl")

library(RCurl)

URL <- "https://courses.edx.org/c4x/KIx/KIexploRx/asset/List_P_3D_data.csv"

destfile <- "List_P_3D_data.csv"

download.file(URL, destfile = destfile, method = "curl")

a <- read.csv2(destfile)

This approach improves the portability, traceability and the reproducible research quality of the R code…

####################################################################

# how to install openWAR R package in Ubuntu 14.04 LTS

####################################################################
        
        # in Ubuntu 14.04 LTS, first run this Terminal command to install prerequisites:
        # sudo apt-get install libxml2-dev libxslt1-dev libcurl4-gnutls-dev
        # then run the following commands in RStudio:
        install.packages(“devtools”)
        library(devtools)
        install.packages(“stringr”)
        library(stringr)
        install.packages(“plyr”)
        library(plyr)
        install.packages(“RCurl”)
        library(RCurl)
        install.packages(“bitops”)
        library(bitops)
        # install Sxslt via the following forked github repository:
        install_github(“Sxslt”, “cboettig”)
        install_github(“openWAR”, “beanumber”)
        library(openWAR)
        gd = gameday()
        summary(gd)
        # following command will retrieve data from 15 games from yesterday:
        ds = getData()
        head(ds)
        summary(ds)
        str(ds)
    
    
    Here is the output I got on my computer running Ubuntu 14.04 LTS:
    
    > library(devtools)
    > install_github(“openWAR”, “beanumber”)
    Installing github repo openWAR/master from beanumber
    Downloading master.zip from https://github.com/beanumber/openWAR/archive/master.zip
    Installing package from /tmp/RtmpQjDf5N/master.zip
    arguments ‘minimized’ and ‘invisible’ are for Windows only
    Installing openWAR
    ‘/usr/lib/R/bin/R’ –vanilla CMD build ‘/tmp/RtmpQjDf5N/devtools4acb2bc694ba/openWAR-master’ –no-manual –no-resave-data
    
    * checking for file ‘/tmp/RtmpQjDf5N/devtools4acb2bc694ba/openWAR-master/DESCRIPTION’ … OK
    * preparing ‘openWAR’:
    * checking DESCRIPTION meta-information … OK
    * installing the package to build vignettes
    * creating vignettes … OK
    * excluding invalid files
    Subdirectory ‘demo’ contains invalid file names:
      ‘Ben.Rmd’ ‘Ben.html’ ‘analysis.Rmd’ ‘sanity_checks.Rmd’
    * checking for LF line-endings in source and make files
    * checking for empty or unneeded directories
    * looking to see if a ‘data/datalist’ file should be added
    Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
      EOF within quoted string
    * building ‘openWAR_0.1-1.tar.gz’
    
    ‘/usr/lib/R/bin/R’ –vanilla CMD INSTALL ‘/tmp/RtmpQjDf5N/openWAR_0.1-1.tar.gz’ –library=’/home/ulysses/R/x86_64-pc-linux-gnu-library/3.1′  \
      –install-tests
    
    * installing *source* package ‘openWAR’ …
    ** R
    ** data
    ** demo
    ** inst
    ** preparing package for lazy loading
    Warning: replacing previous import by ‘plyr::count’ when loading ‘openWAR’
    ** help
    *** installing help indices
    ** building package indices
    ** installing vignettes
    ** testing if installed package can be loaded
    Warning: replacing previous import by ‘plyr::count’ when loading ‘openWAR’
    * DONE (openWAR)
    > gd = gameday()
    Error: could not find function “gameday”
    > library(openWAR)
    Warning message:
    replacing previous import by ‘plyr::count’ when loading ‘openWAR’
    > gd = gameday()
    gid_2012_08_12_atlmlb_nynmlb_1
    Loading required package: Sxslt
    Loading required package: stringr
    Loading required package: plyr
    > summary(gd)
           Length Class      Mode     
    gameId  1     -none-     character
    base    1     -none-     character
    url     5     -none-     character
    ds     62     data.frame list     
    > ds = getData()
    Loading required package: RCurl
    Loading required package: bitops
    
    Retrieving data from 2014-07-12 …
    …found 15 games
    gid_2014_07_12_anamlb_texmlb_1
    gid_2014_07_12_arimlb_sfnmlb_1
    gid_2014_07_12_atlmlb_chnmlb_1
    gid_2014_07_12_bosmlb_houmlb_1
    gid_2014_07_12_chamlb_clemlb_1
    gid_2014_07_12_detmlb_kcamlb_1
    gid_2014_07_12_miamlb_nynmlb_1
    gid_2014_07_12_minmlb_colmlb_1
    gid_2014_07_12_nyamlb_balmlb_1
    gid_2014_07_12_oakmlb_seamlb_1
    gid_2014_07_12_pitmlb_cinmlb_1
    gid_2014_07_12_sdnmlb_lanmlb_1
    gid_2014_07_12_slnmlb_milmlb_1
    gid_2014_07_12_tormlb_tbamlb_1
    gid_2014_07_12_wasmlb_phimlb_1
    Warning messages:
    1: In readData.gameday(gd) : NAs introduced by coercion
    2: In readData.gameday(gd) : NAs introduced by coercion
    3: In readData.gameday(gd) : NAs introduced by coercion
    4: In readData.gameday(gd) : NAs introduced by coercion
    5: In readData.gameday(gd) : NAs introduced by coercion
    6: In readData.gameday(gd) : NAs introduced by coercion
    7: In readData.gameday(gd) : NAs introduced by coercion
    > head(ds)
      pitcherId batterId field_teamId ab_num inning   half balls strikes endOuts     event actionId
    6    571945   594777          140      1      1    top     1       0       1    Flyout       NA
    7    571945   545361          140      2      1    top     1       1       2   Lineout       NA
    8    571945   405395          140      3      1    top     1       2       3 Groundout       NA
    1    450308   425783          108      4      1 bottom     4       2       0      Walk       NA
    2    450308   462101          108      5      1 bottom     0       1       0  Sac Bunt       NA
    3    450308   425567          108      6      1 bottom     0       1       1  Forceout       NA

How to install rstudio from source code in Ubuntu 13.10 64-bit?
===============================================================
Free disk space required: around 5 GB

Copy-paste the following commands into the Terminal one by one:

cd

git clone https://github.com/rstudio/rstudio.git

cd rstudio/

mkdir build

cd build/

sudo DEBIAN_FRONTEND=noninteractive apt-get update

sudo DEBIAN_FRONTEND=noninteractive apt-get –yes –force-yes install libboost-all-dev cmake libqt4-dev build-essential libqtwebkit-dev

cd ~/rstudio/dependencies/common

bash install-common

cd

bash ~/rstudio/dependencies/linux/install-dependencies-debian

cd ~/rstudio/build

cmake .. -DRSTUDIO_TARGET=Desktop -DCMAKE_BUILD_TYPE=Release

sudo make

sudo make install

sudo ln -s /usr/local/lib/rstudio/bin/rstudio /usr/bin

cd