I have started to work on Machine Learning(ML) from some time or more precisely on Oracle Machine Learning. It’s a long way to go and I shall be adding here my learnings. This post is kind of an introductory post for all who are not aware about ML, what it does and how it’s going to be available inside the Oracle’s tech stack. Of course, as always, any corrections/additions are always welcome. Let’s start.
What’s the need for ML?
Computers and computers programs , from day one they came into inception, were designed to do just one thing- achieve a predefined objective. How did we accomplish this in the past? We shall decide what is that target that we want to achieve first? Once selected that, we will write a set of rules i.e. many IF THEN ELSE IF etc. and enter them in our programming code. Computer will reach at our objective by executing those steps. Of course, it did work very well but if we want to achieve our objective digging deeper into the data, we didn’t have much of the options. Or even if we did have, we couldn’t find those hidden nuggets in the data which could lead to us to our objective more quickly and accurately. For example, if we are a bank and we have to approve loan application of the applicants, we had to scan those applicant’s requests manually and see if they are With ML, you are no longer explicitly program the computer what to do. Instead, you are going to let the algorithm, the ML algorithm , find out find out the best steps to reach the objective that you have set. So if we take an example, imagine we want to sell a certain type of product and we want to select top 20% of customers who most likely to buy that product. With traditional programming , how would you do that?
Well of course, you would have your input list with your customers. And then you would create some kind of set of rules which you think that they will correctly identify the top 20% of the customers. As we mentioned before, those rules are going to be in the form of IF THEN ELSE i.e. age, or the gender of the customer or may be whether customer bought a product in the last month. However, how can you know that those rules that you write are really the right ones? Of course, you- the human component, will look at the historical data. but to identify the right set of rules is going to be quite difficult. And even if you do get the right rules, how can you maintain the consistency of them? Reason it’s an issue because customer’s behavior will change over time.
Now with ML we are not going to explicitly create that set of rules anymore but we will let the computer find out how to go from input to output. We are going to do that by training a model, and the model is basically an understanding of what is the relationship between the given inputs and the desired outputs. This, there are two phases that primarily work in ML- training phase and prediction phase. During the training phase we create that model. And in this case, we will input the historical sales transactions to let that model learn. And then in the later phase, in the prediction phase, we are going to use that model that we have created to predict the future behavior of customers. There are differnt algorithms to solve differnt types of problems. But here, we shall understand the most important ones: Supervised learning and Unsupervised Learning.
We call ML supervised when we have example data to teach the model. It’s really very simple. You basically show the algorithm examples of input and the correct output corresponding with that input. And now the algorithm basically infers the relationship between the input and output. So if we go back to the example, if we are trying to select those customers who most likely are going to buy a product, imagine we have a historical data in which you can see which of our customers were offered this product already in the past. and also their response to that offer. And also their response to that offer, whether they bought, yes or no. Now with that of course, we have excellent examples to train and model with. So you don’t have to explicitly program anymore.
What does that mean for you as a developer or as a business user? Well, instead of focusing your time on programming , you are going to spend a lot more time on selecting the right data or right features of predictive value. Because of course those features that data, that needs to go into the algorithm to learn from.
In supervised learning, there are two different types- Classification and other is called Regression. Classification is all about predicting categories/distinct values and putting them in explicitly created categories. And that’s also what we do in here because we are predicting “buy” or “no buy”- two distinct values/categories. On the other hand, if we are trying to predict a value, we call it Regression. For example, predict customer lifetime value in local currency. There are various algorithms used for both classification and regression, but in principle the steps are the same- training the model and then used the model to get the predict
Unsupervised learning is different because we don’t have a label. We don’t have the desired example output data so we can’t train the model in the same way as we could in supervised learning. Thus instead what we do is to give a set of input data and we basically ask the algorithm- hey find out what are the hidden patterns in this data. And we basically don’t give many more instructions than this. One example of such is Clustering. So we look at the customers and we ask, create a certain amount of logical clusters. It may appear that both Classification and Clustering are similar and they do appear alike as well. But unlike Classification, we don’t label our predictions in any explicitly created categories.
Okay but do we have ML somewhere used?
Good that you asked it. And the answer may surprise you– ML is already built in variety of standards around us. Take customer experience, or ERP applications. If you are using HCM cloud, ML is already built in match candidates to match with open positions by going through the text in their CVs. In Oracle database, it’s being used in the tools such as Cluster Health Advisor to predict when we can have problem in our cluster nodes i.e. node evictions. And it works completely out of the box. We don’t have to do anything or enable anything explicitly in the database for making ML to work for us.
Now that said, another very different way of using ML is within BI. Here we are talking about things like dashboards or visualizations. We can use ML not just to plot historical revenue but also to visually plot predicated future revenue. Thus if you we want to work with self-service data visualization, you can have a business end user using ML tools like clustering and trending, available to him.
And of course, we can have ML being used in Platform As A service(PAAS) where different technology stack can use ML for different types of desired outputs. In the platform we can have two types of data sources. So we can have that ML depends on media sources for example, think about images, video, audio. Take your mobile phone with personal assistant with speech recognition or the self driving cars that interpret the video feed coming from the cameras on the car and train the autodriving software or in the medical world where the ML is used to diagnose images from MRI cans. So these are the ML use cases where ML uses media as input, as the data source.
On the other hand we have corporate data cases, data that’s stored in the databases. So for, in your sales and marketing, identifying the best potential target for next campaign- ML is a very good fit for that. Or if you are a service, to which people can subscribe to, ML can predict which customers can leave the service. You can also use it find out for finding applicants who possibly can be a problem for us, for example giving loan to them as they may not repay it.
So we have Big data platform, for media and such and then we have database , suitable for corporate data. And the best thing is that with Oracle Machine Learning, we can do both. But what we shall be focusing on here is to use OML with the database and especially with Autonomous-Database along with Oracle Analytics Cloud.
I hope this post does give you some basic idea about ML and how it does fit in our technology savvy world. I would very highly encourage you to sign-up for Oracle Cloud Free tier , if you haven’t done already so and get your first autonomous database created. It will be the one that we shall be using with OML in our future posts.
Hope that helped.