Oracle R - What is it good for?

Wednesday, September 24, 2014

In short: what is R?

 

R offers a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.

 

 

Basically R can provide you with the ability to perform complex analysis on your data. This has already been combined by the good people at Oracle to provide Oracle R Enterprise, which integrates R with the Oracle Database.

You can read this to help install it. The installation was pretty straightforward, just don’t forget to set these variables correctly:

 

  1. include the R path in the PATH variable

  2. ORACLE_HOME

  3. ORACLE_SID

 

Once everything is installed, you’re ready to start analysing! The following is a very good tutorial on R for Oracle. I used most of it to do some analysing myself!

 

I started out by connecting to the oracle database, more specifically the “SCOTT” schema and I attach it to make things easier for me;

 

> ore.connect("scott", "ORA11G2", "localhost", "tiger", all=TRUE)
> ore.attach("SCOTT")

 

Lets see what tables we have access to;

 

> ore.ls()
[1] "BONUS"    "DEPT"     "EMP"      "SALGRADE"

 

There’s not that much data to work with, but with some imagination I can try to use the EMP data to have an idea what my salary would look like if I was to get hired this year under comparable circumstances as let’s say a clerk or a salesman. As a reminder (who hasn’t worked with this demo data before?); here how the EMP data is shown using RStudio;

 

 

 

I will use a subset of this data;

 

> myEMP <- subset(ore.pull(EMP), JOB=='CLERK', select=c(SAL, HIREDATE))

 

Now, let’s see what this looks like once it’s plotted;

 

> with(myEMP, plot(HIREDATE,SAL))

> title(main="EMP data")

 

 

Ok, it’s not that much but there were only four clerks to begin with. Let’s see where this little experiment takes us! First I will create vectors for the SAL and HIREDATE columns, since this is easier to work with in R.

 

> salary <- myEMP$SAL

> dates <- myEMP$HIREDATE

 

Now, I had some trouble working with the dates. The format that they are passed through from the Oracle database is not a standard R format. Plus, my experience teaches me that working with dates can be tricky. Sometimes its easier to use Julian dates, since these can be regarded as just ‘some’ numbers.

 

> datesJul <- julian(dates)

 

Plotting this data yields the same graph, but only in Julian dates;

 

> plot(salary~datesJul, main="EMP data")

 

 

Lets try out a linear regression on that data;

 

> lm.out = lm(salary~datesJul)

> abline(lm.out, col="red")


 

 

It seems that my salary could be quite impressive if I was to get a job at that fictional company anytime soon! I use the coefficients function to extract the parameters of the estimated regression equation.

 

> coeffs = coefficients(lm.out)

> mySal = coeffs[1] + coeffs[2]*julian(Sys.Date( ))

 

Which returns 1790.859 if I was to begin working there about now! Assuming that it is after tax and in a popular currency (EURO?), it’s not that bad for a clerk job :)

 

Using that formula, I can fill a small table and return the result back to the database. For this, I wrote a small loop to fill a dataframe;

 

> output <- matrix(ncol=2, nrow=100)

>

> for(i in 1:100){

    output[i,] <- c(4000+i*500, coeffs[1] + coeffs[2]*(4000+i*500))

}

> EMP_forecast <- data.frame(output) #I convert it to a dataframe

> colnames(EMP_forecast) <- c("julian_date", "estimated_salary") #keeping it fresh!

 

Now, to write these results back to the Oracle database, I found that there are a lot of options. You can use ore.push or ore.save or you can write an sql function to do the most of the work. I found the use of the ROracle package the most straightforward;

 

> library(ROracle)

> drv <- dbDriver("Oracle")

> con <- dbConnect(drv, "SCOTT", "tiger")

> dbWriteTable(con, "EMP_forecast", EMP_forecast)

 

In sql developer, I can query the data like a normal database table;

 

 

Once I have the data in Oracle, I can use APEX, OBI Analytics or any other tool that offers me the ability to visualize the data.

 

So in short, R is good for absolutely a heap of things! It takes some getting used to, but once you get the hang of the syntax, the analytical possibilities seem endless!

 

Good luck!

 

PVE