orange-banner.png

Orange can be smart

Wednesday, September 21, 2016

Metal Molly knew what they were saying, or they have been very popular in Ljubljana, but as the song goes, orange seems to be very smart!

Orange is a very easy to use, and free, machine learning and data mining software program (written in Python). It has a click-and-drag smooth front-end for explorative data analysis and visualization, and can also be used as a Python library. The program is maintained and developed by the Bioinformatics Laboratory of the Faculty of Computer and Information Science at University of Ljubljana (wikipedia).

I’m no statistician or no expert data scientist. But I still remember some stuff from back in the days when there were only two stressful periods in the year (not counting christmas); exams! I indulged my fair amount of statistics and that which I remembered is enough to get me started in using Orange.

First things first; you can download it here. After that, just go here and get started!

I wanted to do something simple, yet sort of useful. I found a dataset concerning a weather problem. The data is fairly simple; we have several instances in the dataset that are characterized by the values of features, or attributes, that measure different aspects of the instance. In this case there are four attributes: outlook, temperature, humidity, and windy. The outcome is whether or not to go outside and play!

Outlook Temperature Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No

I load the data through a flat file in Orange and I use a ‘Data Table’ to view my data;

I add a ‘Select Columns’ so that I can determine which features I want to use and what my target variable, the variable that I want to predict, will be;

The target value is the attribute ‘Play’. I want to build a decision tree, based on the data above, that I can use to predict whether or not we will go out and play, based on the weather conditions. To do this, I just drag the ‘Classification Tree’ widget in the canvas, connect this with ‘Select Columns’ widget et voilà! (I also added a ‘Classification Tree Viewer’ in order to see the tree). And that’s all you have to do to get a result! And seeing as I did this in a couple of minutes, imagine what you could do in a couple of hours! To be fair, you’ll have to brush up on your statistics skills, but still it’s pretty neat to see what the possibilities are!

Oh yes, the end result (notice the automatic naming, awesome!);

And the tree;

Looking at the data and the tree, we can see that we’ll go out and play when the outlook is overcast. When the outlook isn’t overcast, the humidity is normal and it’s not windy, we’ll definitely go out and play! If it is windy in that last case, there’s a 50 percent change we do go out and play!

Keep in mind that the dataset was really, really small and handles a very easy problem. But you get the idea and it shows how easy you can do cool things with this tool!

Good luck!

-PVE-