Linear regression anyone can understand

The purpose of this article is to make anyone understand what linear regression is regardless of your profession. If you're someone like me who's making a career transition to data science or machine learning, you'll know how frustrating it is when your study is disturbed by technical jargon. This makes the process of learning very boring and tedious. I hope this article will awaken your curiosity and open your eyes to the interesting world of Linear regression.

Linear regression sits at the gate to the adventurous world of machine learning. It is one of the first things you'll encounter on your quest for knowledge in the field of data. Linear regression is used to make sense of data by revealing the relationship between the independent (input features) and dependent (target) values (you'd get a better understanding of what independent and dependent values are as we go along). It is based on this relationship that we are able to observe patterns and make predictions from data.

A practical Example:

Let's say you want to buy a fairly used car. Suppose you want to find out how much the car costs, how will you go about it if we were to use linear regression? Firstly, you'd have to consider some factors like (data gathering):

The mileage of the car you want to buy
The make of the car you want to buy
The model of the car you want to buy
If the car is an imported or locally used car
The features of the car
The general wear and tear of the car

After talking to a car dealer and to some family members, you discover that the price of the car is dependent on three factors from the above list: make, model and mileage (These three factors may not be correct in real life because I am not a real life car dealer or an expert in cars. There may be more or less factors that determine price in reality. However the example suffices to explain the principle of linear regression and get the intended points across). Therefore with these three factors we should be able to reasonably predict the price of the fairly used car.

The next step is to determine the relationship between these three factors and the target variable. The target variable is the price of the car and is also the dependent variable because its value is dependent on the three factors mentioned above. This can be represented thus: Price = (? x make) + (? x model) + (? x mileage) These three attributes that determine the price are called features in machine learning terminology. We must understand the percentage to which each of these features contribute to the total price and give it a weight ( represented by '?' ). When this weight is gotten, we can now reasonably predict the value of any fairly used car based on the three features in the above equation. Thus, we would have understood the underlying relationship between the features and the target value.

The above explanation is an easy to understand explanation of the purpose of linear regression i.e. to understand the underlying relationship between the independent and target (dependent) values. By now I'm sure you have a better understanding of what linear regression is. You may further ask, how do we determine the weight of a feature? In the next post we will examine how to do this using gradient descent.