This guide is designed so you can go from zero to a fully working project with deployment. Every step is written clearly so that if you follow along, you will end up with a project that runs, looks professional, and is ready for GitHub.
Table of Contents
Project Overview
You will build a Student Score Predictor.The goal:
- Take student data (reading and writing scores).
- Predict the math score using a machine learning model.
- The data is simple.
- The workflow is realistic.
- The output is easy to understand.
Language and Tools
1. Language: Python2. Libraries (with purpose):
- Pandas: This handles and processes data.
- NumPy: It is used for numerical operations.
- Matplotlib & Seaborn: These are used for visualization.
- Scikit-learn: This is a machine learning model.
- Joblib: This is used to save model.
- Streamlit: This is used to build web app.
- Jupyter Notebook (optional)
- VS Code / Command Prompt (CMD)
Step 1: Setup Project Structure
Create this structure:student-score-predictor/
│
├── data/
├── model/
├── project.py
├── app.py
├── requirements.txt
└── README.md
This structure keeps your files organized and avoids path errors. It also makes your project look professional on GitHub.
Step 2: Download Dataset
Download dataset:1. Search on Kaggle "Student Performance in Exams".
2. Save it as:
You are working with real-world data. Do not change column names, or your code will break.data/students.csv
Step 3: Install Required Libraries (CMD)
Open Command Prompt (CMD) and run:These libraries provide all required functionality. If even one is missing, your code will not run.pip install pandas numpy matplotlib seaborn scikit-learn joblib streamlit
Step 4: Complete Model Code (project.py)
Create project.py and copy paste this code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load dataset
df = pd.read_csv("data/students.csv")
# Select required columns
df = df[['math score', 'reading score', 'writing score']]
print(df.head())
# Visualization
sns.histplot(df['math score'], kde=True)
plt.show()
# Features and target
X = df[['reading score', 'writing score']]
y = df['math score']
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
print("MSE:", mean_squared_error(y_test, y_pred))
print("R2 Score:", r2_score(y_test, y_pred))
# Save model
joblib.dump(model, "model/student_model.pkl")
# Plot
plt.scatter(y_test, y_pred)
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.show()
Explanation:This file performs the full pipeline:
- Loads data
- Visualise it
- Trains a model
- Evaluates performance
- Saves the model
Step 5: Run the Model (CMD)
Explanation:python project.py
Running this file will:
- Train the model
- Show graphs
- Save model file (Student_model.pk1)
Step 6: Create Web App (app.py)
Create app.py:
import streamlit as st
import pandas as pd
import joblib
model = joblib.load("model/student_model.pkl")
st.title("Student Score Predictor")
reading = st.slider("Reading Score", 0, 100, 50)
writing = st.slider("Writing Score", 0, 100, 50)
if st.button("Predict"):
input_data = pd.DataFrame({
'reading score': [reading],
'writing score': [writing]
})
prediction = model.predict(input_data)
st.success(f"Predicted Math Score: {prediction[0]:.2f}")
Explanation:This creates a simple interface where users input values and get predictions.
Step 7: Run App Locally (CMD)
Open command prompt and type:Explanation:streamlit run app.py
Your browser will open automatically showing your app.
Step 8: Create requirements.txt
Explanation:pandas
numpy
matplotlib
seaborn
scikit-learn
joblib
streamlit
Deployment platforms use this file to install dependencies.
Step 9: Upload to GitHub (CMD)
Open command prompt and type:Explanation:git init
git add .
git commit -m "Data Science Project"
git branch -M main
git remote add origin YOUR_LINK
git push -u origin main
Your project must be on GitHub before deployment.
Step 10: Deploy on Streamlit Cloud
Follow the steps below to deploy on streamlit cloud:- Go to Streamlit Cloud.
- Login with GitHub.
- Click New App.
- Select your repository.
- Choose app.py.
- Click Deploy.
This will generate a public link where anyone cna use your app.
Common Mistakes
- Wrong File Path: If students.csv is not found, your folder structure is incorrect or file is in the wrong location.
- Not Running Model Before App: If model file is missing, run project.py first to generate it.
- Missing Libraries: If you see “Module not found”, install dependencies using pip.
- Column Name Error: If you get a KeyError, check if dataset column names were modified.
- Feature Mismatch: If app crashes, ensure model and input features match exactly.
Conclusion
This project takes you through the complete journey of a data science workflow. You started with raw data, built a machine learning model, and then deployed it as a working application.What makes this valuable is not just the model, but the fact that you turned it into something usable. That is what recruiters look for. If you can explain each step clearly, this project alone is enough to demonstrate your fundamentals.
Build one or two more projects like this, and you will have a strong portfolio ready for internships and placements.
Frequently Asked Questions
1. Can I run everything using CMD?2. Do I need Jupyter Notebook?Yes, all commands in this guide are designed for Command Prompt.
3. Can I use a different dataset?No, it is optional. You can complete everything using Python files.
4. Why is my app not opening?Yes, the same process applies to any dataset.
5. Is this enough for a resume project?Make sure Streamlit is installed and command is correct.
Yes, for beginners this is a strong and complete project.
0 Comments