🧠 Building a Logistic Regression Model with Polynomial Features — Real-World Medical Application
🔍 Introduction
In real-life scenarios like medical diagnosis, classifying patients as healthy or diseased often involves complex relationships between multiple biological indicators. These relationships are non-linear, which means a straight line (linear model) simply isn’t good enough. To tackle this, I designed a Logistic Regression model with Polynomial Features that can handle such non-linear decision boundaries with high accuracy.
In this blog post, I’ll walk you through:
-
What I built and why
-
How I created synthetic but realistic data
-
The algorithms and techniques I used
-
The results, visualization, and GitHub repository
🔗 GitHub Repository:
👉 Logistic Regression with Polynomial Features
🏥 Real-World Analogy: A Medical Diagnosis System
Imagine a simple system that predicts whether a patient is sick or healthy based on two lab test results (e.g., X1 = blood sugar, X2 = blood pressure). If the symptoms follow a circular or ring-shaped pattern (which is common when data clusters form), we need something more powerful than a straight-line model.
That’s where Polynomial Logistic Regression comes in.
📘 Techniques and Algorithms Used
Here are all the techniques and components that make up this project:
🔹 1. Data Generation (Synthetic Medical Data)
-
Two features per sample (X1 and X2)
-
100 total samples:
-
Class 0 (Healthy) → Clustered near the center (inner circle)
-
Class 1 (Diseased) → Spread in an outer ring
-
🔹 2. Logistic Regression Model
-
Binary Classification using the Sigmoid function:
🔹 3. Polynomial Feature Mapping
-
Instead of using raw features
[x1, x2]
, we map them to: -
This allows the model to learn curved (non-linear) decision boundaries.
🔹 4. Cost Function
-
Standard logistic loss:
🔹 5. Gradient Descent Optimization
-
I implemented gradient descent manually to minimize the cost:
-
Compute gradients: ∂J/∂w and ∂J/∂b
-
Update weights iteratively using learning rate
α = 0.01
-
Total iterations: 10,000
-
🔹 6. Accuracy Evaluation
-
The model achieved ~95% to 100% training accuracy, depending on the random seed.
🔹 7. Visualization
-
Scatter plot of the original dataset
-
Decision boundary plotted using a contour plot
-
Clearly shows the model has learned a curved boundary that separates both classes effectively
🧪 Results
-
✅ Training Accuracy: ~95%–100%
-
🟦 Class 1 (Diseased): Blue circles
-
🟥 Class 0 (Healthy): Red crosses
-
📈 Decision Boundary: Curved yellow line
-
🔍 Boundary learned only because of polynomial features; without them, the model would fail
📁 Project Structure
💻 How to Run the Code
-
Clone the repository:
-
Install required libraries:
-
Run the model:
You’ll see both the original dataset and the decision boundary plotted.
💬 Conclusion
This project demonstrates how basic algorithms, when combined with feature engineering (like polynomial expansion), can solve non-linear classification problems — just like a real-world medical diagnosis system might require. Everything was implemented from scratch, without relying on any machine learning libraries like Scikit-learn or TensorFlow.
💡 What’s Next?
-
Add L2 regularization to prevent overfitting
-
Try degree 3 or 4 polynomials
-
Build a web interface (Flask/Streamlit) to test predictions
-
Test on real-world healthcare datasets