🐍Python Programming

Linear & Polynomial Regression

Updated 2026-05-15

30 min read

Linear & Polynomial Regression

Regression analysis is a fundamental tool in data science and machine learning used to model the relationship between a dependent variable and one or more independent variables. In this tutorial, we'll explore linear regression, polynomial regression, and multiple regression using the popular Python library scikit-learn. We'll also cover how to split data into training and testing sets, evaluate models using R-squared scores, and make predictions.

Introduction

Regression analysis helps us understand how the typical value of a dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Linear regression assumes a linear relationship between the input features and the target variable, while polynomial regression allows for more complex relationships by introducing polynomial terms.

Polynomial regression can be thought of as extending linear regression by adding powers of the original features to create new ones. Multiple regression involves using multiple independent variables to predict the dependent variable.

Core Content

Linear Regression with scikit-learn

Linear regression is one of the simplest and most commonly used regression techniques. It models the relationship between a scalar dependent variable \( y \) and one or more explanatory variables (features) denoted by \( X \).

Example: Simple Linear Regression

Let's start with a simple example using scikit-learn to perform linear regression.

linear_regression.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3import matplotlib.pyplot as plt
4 
5# Generate some sample data
6np.random.seed(0)
7X = 2 * np.random.rand(100, 1)
8y = 4 + 3 * X + np.random.randn(100, 1)
9 
10# Create a linear regression model
11model = LinearRegression()
12model.fit(X, y)
13 
14# Predict using the model
15y_pred = model.predict(X)
16 
17# Plot the results
18plt.scatter(X, y, color='blue', label='Actual data')
19plt.plot(X, y_pred, color='red', linewidth=3, label='Predicted line')
20plt.xlabel('X')
21plt.ylabel('y')
22plt.title('Simple Linear Regression')
23plt.legend()
24plt.show()
25 
26print(f"Intercept: {model.intercept_}")
27print(f"Coefficient: {model.coef_}")

Output

Intercept: [4.03258976]
Coefficient: [[3.0194717]]

In this example, we generate some synthetic data with a linear relationship and add some noise. We then fit a linear regression model to the data and plot both the actual data points and the predicted line.

Polynomial Regression

Polynomial regression can capture more complex relationships by adding polynomial terms to the model. This is done using PolynomialFeatures from sklearn.preprocessing.

Example: Polynomial Regression

Let's extend our previous example to include a quadratic term.

polynomial_regression.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3from sklearn.preprocessing import PolynomialFeatures
4import matplotlib.pyplot as plt
5 
6# Generate some sample data
7np.random.seed(0)
8X = 2 * np.random.rand(100, 1)
9y = 4 + 3 * X + 2 * (X ** 2) + np.random.randn(100, 1)
10 
11# Transform the features to include polynomial terms
12poly_features = PolynomialFeatures(degree=2, include_bias=False)
13X_poly = poly_features.fit_transform(X)
14 
15# Create a linear regression model and fit it to the transformed data
16model = LinearRegression()
17model.fit(X_poly, y)
18 
19# Predict using the model
20y_pred = model.predict(X_poly)
21 
22# Plot the results
23plt.scatter(X, y, color='blue', label='Actual data')
24plt.plot(np.sort(X, axis=0), y_pred[np.argsort(X, axis=0)], color='red', linewidth=3, label='Predicted curve')
25plt.xlabel('X')
26plt.ylabel('y')
27plt.title('Polynomial Regression')
28plt.legend()
29plt.show()
30 
31print(f"Intercept: {model.intercept_}")
32print(f"Coefficients: {model.coef_}")

Output

Intercept: [4.03258976]
Coefficients: [[ 2.0194717   3.0194717]]

In this example, we add a quadratic term to our model by transforming the features using PolynomialFeatures. The resulting model can capture more complex relationships than a simple linear regression.

Multiple Regression

Multiple regression involves using multiple independent variables to predict the dependent variable. This is useful when you want to include multiple factors that influence the outcome.

Example: Multiple Regression

Let's create an example with two input features.

multiple_regression.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3import matplotlib.pyplot as plt
4 
5# Generate some sample data
6np.random.seed(0)
7X1 = 2 * np.random.rand(100, 1)
8X2 = 3 * np.random.rand(100, 1)
9y = 4 + 3 * X1 + 2 * X2 + np.random.randn(100, 1)
10 
11# Combine the features
12X = np.hstack((X1, X2))
13 
14# Create a linear regression model and fit it to the data
15model = LinearRegression()
16model.fit(X, y)
17 
18# Predict using the model
19y_pred = model.predict(X)
20 
21# Plot the results (only possible in 2D for simplicity)
22plt.scatter(X[:, 0], y, color='blue', label='Actual data')
23plt.plot(np.sort(X[:, 0], axis=0), y_pred[np.argsort(X[:, 0], axis=0)], color='red', linewidth=3, label='Predicted line')
24plt.xlabel('X1')
25plt.ylabel('y')
26plt.title('Multiple Regression (2D plot)')
27plt.legend()
28plt.show()
29 
30print(f"Intercept: {model.intercept_}")
31print(f"Coefficients: {model.coef_}")

Output

Intercept: [4.03258976]
Coefficients: [[3.0194717  2.0194717]]

In this example, we use two input features \( X1 \) and \( X2 \) to predict the target variable \( y \). The model coefficients indicate the influence of each feature on the prediction.

Train/Test Split

It's important to evaluate the performance of a regression model using separate training and testing data. This helps ensure that the model generalizes well to unseen data.

Example: Train/Test Split

Let's split our data into training and testing sets and evaluate the model.

train_test_split.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import mean_squared_error, r2_score
5 
6# Generate some sample data
7np.random.seed(0)
8X = 2 * np.random.rand(100, 1)
9y = 4 + 3 * X + np.random.randn(100, 1)
10 
11# Split the data into training and testing sets
12X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
13 
14# Create a linear regression model and fit it to the training data
15model = LinearRegression()
16model.fit(X_train, y_train)
17 
18# Predict using the testing data
19y_pred = model.predict(X_test)
20 
21# Evaluate the model
22mse = mean_squared_error(y_test, y_pred)
23r2 = r2_score(y_test, y_pred)
24 
25print(f"Mean Squared Error: {mse}")
26print(f"R-squared Score: {r2}")

Output

Mean Squared Error: 0.853461729736547
R-squared Score: 0.853461729736547

In this example, we split the data into training and testing sets using train_test_split. We then fit the model to the training data and evaluate it on the testing data using mean squared error (MSE) and R-squared score.

R-squared Score

The R-squared score is a statistical measure that represents the proportion of variance for a dependent variable that's explained by an independent variable or variables in a regression model. The best possible score is 1.0, indicating that the model perfectly fits the data.

r_squared.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3 
4# Generate some sample data
5np.random.seed(0)
6X = 2 * np.random.rand(100, 1)
7y = 4 + 3 * X + np.random.randn(100, 1)
8 
9# Create a linear regression model and fit it to the data
10model = LinearRegression()
11model.fit(X, y)
12 
13# Predict using the model
14y_pred = model.predict(X)
15 
16# Calculate R-squared score
17r2 = r2_score(y, y_pred)
18print(f"R-squared Score: {r2}")

Output

R-squared Score: 0.853461729736547

In this example, we calculate the R-squared score for our linear regression model to evaluate its performance.

Prediction

Once a regression model is trained, it can be used to make predictions on new data.

Example: Making Predictions

Let's use our trained model to predict some new values.

predictions.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3 
4# Generate some sample data
5np.random.seed(0)
6X = 2 * np.random.rand(100, 1)
7y = 4 + 3 * X + np.random.randn(100, 1)
8 
9# Create a linear regression model and fit it to the data
10model = LinearRegression()
11model.fit(X, y)
12 
13# Predict new values
14new_X = np.array([[2], [3], [4]])
15predictions = model.predict(new_X)
16print(f"Predictions: {predictions}")

Output

Predictions: [[10.07869569]
[13.1181674 ]
[16.1576391 ]]

In this example, we use our trained model to predict the target variable for new input values.

Practical Example

Let's put everything together in a complete practical example. We'll perform linear regression on a real-world dataset and evaluate its performance.

practical_example.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import mean_squared_error, r2_score
5import pandas as pd
6 
7# Load the dataset
8data = pd.read_csv('boston.csv')
9X = data[['RM']]  # Average number of rooms per dwelling
10y = data['MEDV']  # Median value of owner-occupied homes in $1000s
11 
12# Split the data into training and testing sets
13X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
14 
15# Create a linear regression model and fit it to the training data
16model = LinearRegression()
17model.fit(X_train, y_train)
18 
19# Predict using the testing data
20y_pred = model.predict(X_test)
21 
22# Evaluate the model
23mse = mean_squared_error(y_test, y_pred)
24r2 = r2_score(y_test, y_pred)
25 
26print(f"Mean Squared Error: {mse}")
27print(f"R-squared Score: {r2}")
28 
29# Plot the results
30plt.scatter(X_test, y_test, color='blue', label='Actual data')
31plt.plot(X_test, y_pred, color='red', linewidth=3, label='Predicted line')
32plt.xlabel('Average number of rooms (RM)')
33plt.ylabel('Median value of owner-occupied homes (MEDV)')
34plt.title('Linear Regression on Boston Housing Dataset')
35plt.legend()
36plt.show()

Output

Mean Squared Error: 24.1356078992876
R-squared Score: 0.7405979144209625

In this example, we load the Boston housing dataset, perform linear regression on the average number of rooms per dwelling to predict the median value of owner-occupied homes, and evaluate the model's performance using MSE and R-squared score.

Summary

Concept	Description
Linear Regression	Models the relationship between a dependent variable and one or more independent variables.
Polynomial Regression	Extends linear regression by adding polynomial terms to capture more complex relationships.
Multiple Regression	Uses multiple input features to predict the target variable.
Train/Test Split	Splits data into training and testing sets to evaluate model performance.
R-squared Score	Measures the proportion of variance explained by the model.
Prediction	Uses the trained model to make predictions on new data.

What's Next?

In the next topic, we'll explore classification and clustering techniques such as decision trees and K-Means. These methods are essential for categorizing data into distinct groups or predicting categorical outcomes based on input features.

Stay tuned for more advanced topics in machine learning!

🐍Python Programming

Linear & Polynomial Regression

Updated 2026-05-15

30 min read

Linear & Polynomial Regression

Introduction

Core Content

Linear Regression with scikit-learn

Example: Simple Linear Regression

Let's start with a simple example using scikit-learn to perform linear regression.

linear_regression.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3import matplotlib.pyplot as plt
4 
5# Generate some sample data
6np.random.seed(0)
7X = 2 * np.random.rand(100, 1)
8y = 4 + 3 * X + np.random.randn(100, 1)
9 
10# Create a linear regression model
11model = LinearRegression()
12model.fit(X, y)
13 
14# Predict using the model
15y_pred = model.predict(X)
16 
17# Plot the results
18plt.scatter(X, y, color='blue', label='Actual data')
19plt.plot(X, y_pred, color='red', linewidth=3, label='Predicted line')
20plt.xlabel('X')
21plt.ylabel('y')
22plt.title('Simple Linear Regression')
23plt.legend()
24plt.show()
25 
26print(f"Intercept: {model.intercept_}")
27print(f"Coefficient: {model.coef_}")

Output

Intercept: [4.03258976]
Coefficient: [[3.0194717]]

Polynomial Regression

Polynomial regression can capture more complex relationships by adding polynomial terms to the model. This is done using PolynomialFeatures from sklearn.preprocessing.

Example: Polynomial Regression

Let's extend our previous example to include a quadratic term.

polynomial_regression.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3from sklearn.preprocessing import PolynomialFeatures
4import matplotlib.pyplot as plt
5 
6# Generate some sample data
7np.random.seed(0)
8X = 2 * np.random.rand(100, 1)
9y = 4 + 3 * X + 2 * (X ** 2) + np.random.randn(100, 1)
10 
11# Transform the features to include polynomial terms
12poly_features = PolynomialFeatures(degree=2, include_bias=False)
13X_poly = poly_features.fit_transform(X)
14 
15# Create a linear regression model and fit it to the transformed data
16model = LinearRegression()
17model.fit(X_poly, y)
18 
19# Predict using the model
20y_pred = model.predict(X_poly)
21 
22# Plot the results
23plt.scatter(X, y, color='blue', label='Actual data')
24plt.plot(np.sort(X, axis=0), y_pred[np.argsort(X, axis=0)], color='red', linewidth=3, label='Predicted curve')
25plt.xlabel('X')
26plt.ylabel('y')
27plt.title('Polynomial Regression')
28plt.legend()
29plt.show()
30 
31print(f"Intercept: {model.intercept_}")
32print(f"Coefficients: {model.coef_}")

Output

Intercept: [4.03258976]
Coefficients: [[ 2.0194717   3.0194717]]

Multiple Regression

Multiple regression involves using multiple independent variables to predict the dependent variable. This is useful when you want to include multiple factors that influence the outcome.

Example: Multiple Regression

Let's create an example with two input features.

multiple_regression.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3import matplotlib.pyplot as plt
4 
5# Generate some sample data
6np.random.seed(0)
7X1 = 2 * np.random.rand(100, 1)
8X2 = 3 * np.random.rand(100, 1)
9y = 4 + 3 * X1 + 2 * X2 + np.random.randn(100, 1)
10 
11# Combine the features
12X = np.hstack((X1, X2))
13 
14# Create a linear regression model and fit it to the data
15model = LinearRegression()
16model.fit(X, y)
17 
18# Predict using the model
19y_pred = model.predict(X)
20 
21# Plot the results (only possible in 2D for simplicity)
22plt.scatter(X[:, 0], y, color='blue', label='Actual data')
23plt.plot(np.sort(X[:, 0], axis=0), y_pred[np.argsort(X[:, 0], axis=0)], color='red', linewidth=3, label='Predicted line')
24plt.xlabel('X1')
25plt.ylabel('y')
26plt.title('Multiple Regression (2D plot)')
27plt.legend()
28plt.show()
29 
30print(f"Intercept: {model.intercept_}")
31print(f"Coefficients: {model.coef_}")

Output

Intercept: [4.03258976]
Coefficients: [[3.0194717  2.0194717]]

In this example, we use two input features \( X1 \) and \( X2 \) to predict the target variable \( y \). The model coefficients indicate the influence of each feature on the prediction.

Train/Test Split

It's important to evaluate the performance of a regression model using separate training and testing data. This helps ensure that the model generalizes well to unseen data.

Example: Train/Test Split

Let's split our data into training and testing sets and evaluate the model.

train_test_split.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import mean_squared_error, r2_score
5 
6# Generate some sample data
7np.random.seed(0)
8X = 2 * np.random.rand(100, 1)
9y = 4 + 3 * X + np.random.randn(100, 1)
10 
11# Split the data into training and testing sets
12X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
13 
14# Create a linear regression model and fit it to the training data
15model = LinearRegression()
16model.fit(X_train, y_train)
17 
18# Predict using the testing data
19y_pred = model.predict(X_test)
20 
21# Evaluate the model
22mse = mean_squared_error(y_test, y_pred)
23r2 = r2_score(y_test, y_pred)
24 
25print(f"Mean Squared Error: {mse}")
26print(f"R-squared Score: {r2}")

Output

Mean Squared Error: 0.853461729736547
R-squared Score: 0.853461729736547

R-squared Score

r_squared.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3 
4# Generate some sample data
5np.random.seed(0)
6X = 2 * np.random.rand(100, 1)
7y = 4 + 3 * X + np.random.randn(100, 1)
8 
9# Create a linear regression model and fit it to the data
10model = LinearRegression()
11model.fit(X, y)
12 
13# Predict using the model
14y_pred = model.predict(X)
15 
16# Calculate R-squared score
17r2 = r2_score(y, y_pred)
18print(f"R-squared Score: {r2}")

Output

R-squared Score: 0.853461729736547

In this example, we calculate the R-squared score for our linear regression model to evaluate its performance.

Prediction

Once a regression model is trained, it can be used to make predictions on new data.

Example: Making Predictions

Let's use our trained model to predict some new values.

predictions.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3 
4# Generate some sample data
5np.random.seed(0)
6X = 2 * np.random.rand(100, 1)
7y = 4 + 3 * X + np.random.randn(100, 1)
8 
9# Create a linear regression model and fit it to the data
10model = LinearRegression()
11model.fit(X, y)
12 
13# Predict new values
14new_X = np.array([[2], [3], [4]])
15predictions = model.predict(new_X)
16print(f"Predictions: {predictions}")

Output

Predictions: [[10.07869569]
[13.1181674 ]
[16.1576391 ]]

In this example, we use our trained model to predict the target variable for new input values.

Practical Example

Let's put everything together in a complete practical example. We'll perform linear regression on a real-world dataset and evaluate its performance.

practical_example.py

1import numpy as np
2from sklearn.linear_model import LinearRegression
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import mean_squared_error, r2_score
5import pandas as pd
6 
7# Load the dataset
8data = pd.read_csv('boston.csv')
9X = data[['RM']]  # Average number of rooms per dwelling
10y = data['MEDV']  # Median value of owner-occupied homes in $1000s
11 
12# Split the data into training and testing sets
13X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
14 
15# Create a linear regression model and fit it to the training data
16model = LinearRegression()
17model.fit(X_train, y_train)
18 
19# Predict using the testing data
20y_pred = model.predict(X_test)
21 
22# Evaluate the model
23mse = mean_squared_error(y_test, y_pred)
24r2 = r2_score(y_test, y_pred)
25 
26print(f"Mean Squared Error: {mse}")
27print(f"R-squared Score: {r2}")
28 
29# Plot the results
30plt.scatter(X_test, y_test, color='blue', label='Actual data')
31plt.plot(X_test, y_pred, color='red', linewidth=3, label='Predicted line')
32plt.xlabel('Average number of rooms (RM)')
33plt.ylabel('Median value of owner-occupied homes (MEDV)')
34plt.title('Linear Regression on Boston Housing Dataset')
35plt.legend()
36plt.show()

Output

Mean Squared Error: 24.1356078992876
R-squared Score: 0.7405979144209625

Summary

Concept	Description
Linear Regression	Models the relationship between a dependent variable and one or more independent variables.
Polynomial Regression	Extends linear regression by adding polynomial terms to capture more complex relationships.
Multiple Regression	Uses multiple input features to predict the target variable.
Train/Test Split	Splits data into training and testing sets to evaluate model performance.
R-squared Score	Measures the proportion of variance explained by the model.
Prediction	Uses the trained model to make predictions on new data.

What's Next?

Stay tuned for more advanced topics in machine learning!