In short: what is R?
it’s a language and environment for statistical computing and graphics
it’s free
it’s case sensitive
reminds me of SAS, but it’s free!
comments start with ‘#’
R offers a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.
Basically R can provide you with the ability to perform complex analysis on your data. This has already been combined by the good people at Oracle to provide Oracle R Enterprise, which integrates R with the Oracle Database.
You can read this to help install it. The installation was pretty straightforward, just don’t forget to set these variables correctly:
include the R path in the PATH variable
ORACLE_HOME
ORACLE_SID
Once everything is installed, you’re ready to start analysing! The following is a very good tutorial on R for Oracle. I used most of it to do some analysing myself!
I started out by connecting to the oracle database, more specifically the “SCOTT” schema and I attach it to make things easier for me;
> ore.connect("scott", "ORA11G2", "localhost", "tiger", all=TRUE)
> ore.attach("SCOTT")
Lets see what tables we have access to;
> ore.ls()
[1] "BONUS" "DEPT" "EMP" "SALGRADE"
There’s not that much data to work with, but with some imagination I can try to use the EMP data to have an idea what my salary would look like if I was to get hired this year under comparable circumstances as let’s say a clerk or a salesman. As a reminder (who hasn’t worked with this demo data before?); here how the EMP data is shown using RStudio;
I will use a subset of this data;
> myEMP <- subset(ore.pull(EMP), JOB=='CLERK', select=c(SAL, HIREDATE))
Now, let’s see what this looks like once it’s plotted;
> with(myEMP, plot(HIREDATE,SAL))
> title(main="EMP data")
Ok, it’s not that much but there were only four clerks to begin with. Let’s see where this little experiment takes us! First I will create vectors for the SAL and HIREDATE columns, since this is easier to work with in R.
> salary <- myEMP$SAL
> dates <- myEMP$HIREDATE
Now, I had some trouble working with the dates. The format that they are passed through from the Oracle database is not a standard R format. Plus, my experience teaches me that working with dates can be tricky. Sometimes its easier to use Julian dates, since these can be regarded as just ‘some’ numbers.
> datesJul <- julian(dates)
Plotting this data yields the same graph, but only in Julian dates;
> plot(salary~datesJul, main="EMP data")
Lets try out a linear regression on that data;
> lm.out = lm(salary~datesJul)
> abline(lm.out, col="red")
It seems that my salary could be quite impressive if I was to get a job at that fictional company anytime soon! I use the coefficients function to extract the parameters of the estimated regression equation.
> coeffs = coefficients(lm.out)
> mySal = coeffs[1] + coeffs[2]*julian(Sys.Date( ))
Which returns 1790.859 if I was to begin working there about now! Assuming that it is after tax and in a popular currency (EURO?), it’s not that bad for a clerk job :)
Using that formula, I can fill a small table and return the result back to the database. For this, I wrote a small loop to fill a dataframe;
> output <- matrix(ncol=2, nrow=100)
>
> for(i in 1:100){
output[i,] <- c(4000+i*500, coeffs[1] + coeffs[2]*(4000+i*500))
}
> EMP_forecast <- data.frame(output) #I convert it to a dataframe
> colnames(EMP_forecast) <- c("julian_date", "estimated_salary") #keeping it fresh!
Now, to write these results back to the Oracle database, I found that there are a lot of options. You can use ore.push or ore.save or you can write an sql function to do the most of the work. I found the use of the ROracle package the most straightforward;
> library(ROracle)
> drv <- dbDriver("Oracle")
> con <- dbConnect(drv, "SCOTT", "tiger")
> dbWriteTable(con, "EMP_forecast", EMP_forecast)
In sql developer, I can query the data like a normal database table;
Once I have the data in Oracle, I can use APEX, OBI Analytics or any other tool that offers me the ability to visualize the data.
So in short, R is good for absolutely a heap of things! It takes some getting used to, but once you get the hang of the syntax, the analytical possibilities seem endless!
Good luck!
PVE