Oracle R - What is it good for?

Wednesday, September 24, 2014

In short: what is R?


R offers a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.



Basically R can provide you with the ability to perform complex analysis on your data. This has already been combined by the good people at Oracle to provide Oracle R Enterprise, which integrates R with the Oracle Database.

You can read this to help install it. The installation was pretty straightforward, just don’t forget to set these variables correctly:


  1. include the R path in the PATH variable




Once everything is installed, you’re ready to start analysing! The following is a very good tutorial on R for Oracle. I used most of it to do some analysing myself!


I started out by connecting to the oracle database, more specifically the “SCOTT” schema and I attach it to make things easier for me;


> ore.connect("scott", "ORA11G2", "localhost", "tiger", all=TRUE)
> ore.attach("SCOTT")


Lets see what tables we have access to;


[1] "BONUS"    "DEPT"     "EMP"      "SALGRADE"


There’s not that much data to work with, but with some imagination I can try to use the EMP data to have an idea what my salary would look like if I was to get hired this year under comparable circumstances as let’s say a clerk or a salesman. As a reminder (who hasn’t worked with this demo data before?); here how the EMP data is shown using RStudio;




I will use a subset of this data;


> myEMP <- subset(ore.pull(EMP), JOB=='CLERK', select=c(SAL, HIREDATE))


Now, let’s see what this looks like once it’s plotted;


> with(myEMP, plot(HIREDATE,SAL))

> title(main="EMP data")



Ok, it’s not that much but there were only four clerks to begin with. Let’s see where this little experiment takes us! First I will create vectors for the SAL and HIREDATE columns, since this is easier to work with in R.


> salary <- myEMP$SAL

> dates <- myEMP$HIREDATE


Now, I had some trouble working with the dates. The format that they are passed through from the Oracle database is not a standard R format. Plus, my experience teaches me that working with dates can be tricky. Sometimes its easier to use Julian dates, since these can be regarded as just ‘some’ numbers.


> datesJul <- julian(dates)


Plotting this data yields the same graph, but only in Julian dates;


> plot(salary~datesJul, main="EMP data")



Lets try out a linear regression on that data;


> lm.out = lm(salary~datesJul)

> abline(lm.out, col="red")



It seems that my salary could be quite impressive if I was to get a job at that fictional company anytime soon! I use the coefficients function to extract the parameters of the estimated regression equation.


> coeffs = coefficients(lm.out)

> mySal = coeffs[1] + coeffs[2]*julian(Sys.Date( ))


Which returns 1790.859 if I was to begin working there about now! Assuming that it is after tax and in a popular currency (EURO?), it’s not that bad for a clerk job :)


Using that formula, I can fill a small table and return the result back to the database. For this, I wrote a small loop to fill a dataframe;


> output <- matrix(ncol=2, nrow=100)


> for(i in 1:100){

    output[i,] <- c(4000+i*500, coeffs[1] + coeffs[2]*(4000+i*500))


> EMP_forecast <- data.frame(output) #I convert it to a dataframe

> colnames(EMP_forecast) <- c("julian_date", "estimated_salary") #keeping it fresh!


Now, to write these results back to the Oracle database, I found that there are a lot of options. You can use ore.push or or you can write an sql function to do the most of the work. I found the use of the ROracle package the most straightforward;


> library(ROracle)

> drv <- dbDriver("Oracle")

> con <- dbConnect(drv, "SCOTT", "tiger")

> dbWriteTable(con, "EMP_forecast", EMP_forecast)


In sql developer, I can query the data like a normal database table;



Once I have the data in Oracle, I can use APEX, OBI Analytics or any other tool that offers me the ability to visualize the data.


So in short, R is good for absolutely a heap of things! It takes some getting used to, but once you get the hang of the syntax, the analytical possibilities seem endless!


Good luck!