8 Python Coding Patterns in AI/ML Fresher Interviews 2026
Eight Python coding patterns that repeat in AI/ML fresher interviews, from pandas groupby and numpy broadcasting to implementing logistic regression in 20 lines.
Eight Python coding patterns account for the bulk of what AI/ML interviewers actually test at product companies and AI-first startups.
That’s a narrower target than most freshers expect. The typical reaction to “Python for AI/ML interviews” is to open LeetCode and start grinding dynamic programming. That’s the wrong preparation for this round. AI/ML coding rounds care about two things: your ability to wrangle data with pandas and numpy, and your ability to implement a small ML algorithm from first principles. The data-structures-and-algorithms syllabus overlaps only at the edges.
FACE Prep has tracked placement debriefs from students at product companies across Bangalore, Hyderabad, and Pune, and the same eight patterns appear with remarkable regularity. Mastering them doesn’t take months. It takes a focused sprint on the right material. The top 30 AI/ML interview questions for freshers covers the theory layer; this article is the Python coding layer that runs alongside it.
Why AI/ML Python Rounds Test Differently Than DSA Rounds
A standard software engineering coding round asks you to traverse a binary tree or find the longest substring without repeating characters. An AI/ML coding round gives you a DataFrame with 200 rows and a prompt like:
- Example prompt: “Find the average score for each department, excluding nulls, and return only departments with a placement rate above 60%.”
The shift matters because the underlying skill is different. Binary tree traversal tests whether you can reason about recursive structure. The DataFrame question tests whether you know pandas well enough to express the logic in three lines rather than twenty-five. Interviewers at AI-adjacent roles care about the latter because the job will involve exactly that: reading messy data files, cleaning them, extracting signals, and feeding them into a model.
The ML algorithm implementation questions have a similar purpose. Asking you to code logistic regression in 20 lines is not about making you reinvent scipy. It’s about checking that you understand what gradient descent is doing and can write the math as code without consulting documentation every line. If you can only call sklearn.linear_model.LogisticRegression().fit(X, y), you’re a user of the library, not someone who can debug why it’s not converging.
This distinction shows up in how interviewers score. The data-wrangling questions have clean right/wrong answers. The algorithm questions are graded on understanding. The interviewer will ask you to explain the weight-update line and trace through one step of the gradient.
Patterns 1 to 4 — Core pandas and numpy Operations
Pattern 1: Boolean Filtering with Compound Conditions
This is the most-tested single operation. Interviewers give you a DataFrame and ask for a filtered subset based on two or more conditions.
- Interview prompt: “From this student dataset, return rows where the department is ‘CS’ or ‘IT’, the score is above 75, and the student has not already been placed.”
import pandas as pd
df = pd.DataFrame({
'dept': ['CS', 'ECE', 'CS', 'IT', 'ECE', 'CS'],
'score': [82, 67, 91, 74, 88, 62],
'placed': [1, 0, 1, 1, 1, 0]
})
result = df[
(df['dept'].isin(['CS', 'IT'])) &
(df['score'] > 75) &
(df['placed'] == 0)
]
The trip-up: freshers often write df[df['dept'] == 'CS' or df['dept'] == 'IT'], which raises a ValueError because Python’s or doesn’t work element-wise. Use isin() or the bitwise | operator. Parentheses around each condition are not optional.
Pattern 2: groupby with Named Multi-Column Aggregation
The pandas groupby documentation shows several groupby patterns. Interviewers typically ask for the .agg() form that returns multiple summary statistics at once.
- Interview prompt: “Compute the average score and placement rate per department from the same dataset.”
summary = df.groupby('dept').agg(
avg_score=('score', 'mean'),
placement_rate=('placed', 'mean')
).reset_index()
The trip-up: forgetting reset_index() leaves dept as the index rather than a column, which breaks downstream operations. Interviewers notice. The named-aggregation syntax (keyword = (column, function)) was introduced in pandas 0.25 and is now the preferred form over the older dictionary syntax.
Pattern 3: numpy Broadcasting for Column-wise Normalization
Feature normalization before model training is a one-liner in numpy, but only if you understand how broadcasting works across axes. The NumPy broadcasting documentation defines the rules precisely; the short version for interviews is that operations along axis=0 apply column-wise.
- Interview prompt: “Normalize this feature matrix so each column has zero mean and unit variance.”
import numpy as np
X = np.array([
[1.0, 200.0, 0.5],
[2.0, 150.0, 0.8],
[3.0, 300.0, 0.3],
[4.0, 250.0, 0.6]
])
mean = X.mean(axis=0) # shape (3,), one mean per column
std = X.std(axis=0) # shape (3,)
X_norm = (X - mean) / std # (4,3) minus (3,) broadcasts correctly
The trip-up: using X.mean() without axis=0 gives a single scalar that collapses the whole matrix. The from-scratch ML implementations in Patterns 5 and 6 both require this normalisation step before training.
Pattern 4: Missing Value Diagnosis and Imputation
Interviewers give a dataset with intentional nulls and ask you to diagnose and handle them. The question is as much about your decision-making as your syntax.
- Interview prompt: “This DataFrame has missing scores and missing department labels. How do you handle each?”
print(df.isnull().sum()) # column-wise null count
# Numeric column: impute with median (more stable than mean for skewed data)
df['score'] = df['score'].fillna(df['score'].median())
# Categorical column: drop rows with missing labels
df = df.dropna(subset=['dept'])
The trip-up: imputing a categorical column with the mode is defensible, but dropping is usually safer if the proportion is small. Interviewers will ask “why median, not mean?” The answer is that median is not pulled by extreme values in a skewed score distribution.
For a deeper question set covering these four patterns, FACE Prep’s pandas and numpy question drill for AI/ML freshers has 25 additional prompts at varying difficulty.
Patterns 5 to 8 — ML Algorithms and numpy Mechanics
Pattern 5: Logistic Regression in 20 Lines
This is the canonical from-scratch question. Interviewers want to see sigmoid, the gradient of binary cross-entropy, and the weight-update step. You don’t need to implement early stopping or regularisation. A clean 20-line version with correct math is the bar.
- Interview prompt: “Write a logistic regression that fits on (X, y) and has a predict method. No sklearn.”
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def fit(X, y, lr=0.01, epochs=500):
w = np.zeros(X.shape[1])
b = 0.0
m = len(y)
for _ in range(epochs):
z = X @ w + b
yhat = sigmoid(z)
dw = X.T @ (yhat - y) / m
db = (yhat - y).mean()
w -= lr * dw
b -= lr * db
return w, b
def predict(X, w, b):
return (sigmoid(X @ w + b) >= 0.5).astype(int)
Line by line, what the interviewer checks: X @ w + b is the matrix-vector dot product (the linear part), sigmoid maps it to a probability, X.T @ (yhat - y) / m is the gradient with respect to w (derived from cross-entropy loss), and (yhat - y).mean() is the gradient with respect to the bias. If you can explain each line, you pass.
Pattern 6: K-Means in 20 Lines
K-means tests whether you understand centroid initialisation, the assignment step, the update step, and when to stop.
- Interview prompt: “Implement k-means clustering. Show the loop that updates centroids and the convergence check.”
import numpy as np
def kmeans(X, k=3, max_iter=100, seed=42):
rng = np.random.default_rng(seed)
centroids = X[rng.choice(len(X), k, replace=False)]
for _ in range(max_iter):
dists = np.linalg.norm(
X[:, None] - centroids[None, :], axis=2
)
labels = np.argmin(dists, axis=1)
new_centroids = np.array([
X[labels == i].mean(axis=0) for i in range(k)
])
if np.allclose(centroids, new_centroids, atol=1e-6):
break
centroids = new_centroids
return labels, centroids
X[:, None] - centroids[None, :] is the broadcasting trick that computes all pairwise distances at once without a Python loop over data points. Interviewers will specifically ask you to trace through that subtraction and explain the shape of the resulting array.
Pattern 7: numpy Matrix Operations for the Prediction Pipeline
Most freshers know sklearn but struggle to express the same operations in raw numpy. Interviewers close that gap with questions about dot products, reshaping, and argmax.
- Interview prompt: “You have weight matrix W of shape
(10, 4)and input X of shape(100, 4). Write the forward pass that returns the predicted class for each sample.”
import numpy as np
# W shape: (num_classes, num_features) = (10, 4)
# X shape: (num_samples, num_features) = (100, 4)
logits = X @ W.T # (100, 10), one score per class per sample
predictions = np.argmax(logits, axis=1) # (100,), class with highest score
The trip-up: forgetting the transpose W.T and getting a shape mismatch. Interviewers expect you to mentally track array shapes through each operation. They’ll ask “what’s the shape after this line?” as a check.
Pattern 8: Vectorization vs. Python Loops
This is the “why is this slow?” pattern. Interviewers show you loop-based code and ask you to rewrite it, then explain the speed difference.
- Interview prompt: “This function computes the squared error for each sample. Rewrite it without the for loop.”
# Slow version: Python loop
def mse_loop(y_true, y_pred):
errors = []
for i in range(len(y_true)):
errors.append((y_true[i] - y_pred[i]) ** 2)
return sum(errors) / len(errors)
# Fast version: vectorized
def mse_vec(y_true, y_pred):
return ((y_true - y_pred) ** 2).mean()
The explanation interviewers want:
- The loop version executes in CPython, which carries interpreter overhead on each iteration.
- The vectorized version delegates the arithmetic to numpy’s C-compiled backend, running the same computation without interpreter overhead.
- For a dataset of 100,000 samples, the vectorized version typically runs 30 to 100 times faster than the equivalent Python loop.
Building Interview-Ready Muscle Memory
Knowing these 8 patterns conceptually is half the work. The other half is building the muscle memory to write them without hesitation under interview conditions (45 minutes, screen-share, someone watching).
The preparation approach that works: pick one pattern per day, write it entirely from memory without any reference, then run it against a small test dataset to verify it produces the correct output. If you need to check documentation at any point, restart the pattern from scratch the next day. The from-scratch constraint is not punishing. It’s the test that tells you whether you actually know the pattern or just recognise it.
Two weeks at this pace covers all 8 patterns twice, which is enough for most interviews. Connecting each pattern to a real dataset you’ve worked on before makes them easier to recall under pressure. The walk-me-through-your-ai-project interview answer guide explains how to frame that project narrative when the interviewer transitions from the coding round to the project-discussion round.
Where This Fits in Your 2026 AI Placement Path
These 8 patterns are the execution layer of a larger skill stack. Python coding fluency with pandas and numpy gets you through the screening round. What you build with those tools (a deployed model, a data pipeline, a recommendation system) is what differentiates you in the offer stage.
The 2026 AI roadmap for Indian engineering students maps the full arc from aptitude prep to AI-skill building to the kind of projects that actually show up on placement shortlists.
Practicing these 8 patterns in isolation is solid prep. Practicing them while building something real is better. TinkerLLM is where that second layer happens. At ₹299, it puts real LLM API calls in your hands with minimal setup. The pandas and numpy you’ve been drilling for the interview become the data-handling backbone of the project. The logistic regression you can now write in 20 lines becomes the baseline model you cite when the interviewer asks “what did you try before the transformer?”
Primary sources
Frequently asked questions
Do all AI/ML fresher interviews include a Python coding round?
Most product companies and AI-first startups include one. Expect a 30-to-60-minute Python section at firms with dedicated ML or data-engineering tracks. Service-tier IT companies vary, but the trend in 2026 is toward including at least one data-wrangling question.
Can I use sklearn in the coding round or must I code from scratch?
Interviewers usually ask for at least one from-scratch implementation to test conceptual understanding. Using sklearn for everything else is fine and expected. The point of from-scratch is to prove you understand gradient descent or centroid updates, not to ban libraries.
How many lines of code is a typical ML implementation question?
A working logistic regression or k-means in 15 to 25 lines is the expected range, excluding import statements. Clarity and correctness matter more than brevity. Comments that explain the math are a plus.
What pandas operations come up most in AI/ML rounds?
groupby with aggregation, boolean filtering with multiple conditions, handling missing values, and merge/join are the highest-frequency operations. FACE Prep sees these four recurring consistently across placement debrief reports from 2024-2025.
Is numpy knowledge separate from pandas knowledge in interviews?
No. Pandas DataFrames are backed by numpy arrays, and every ML algorithm question requires raw numpy matrix math. Interviewers expect you to move fluidly between both libraries in the same session.
How long should I spend preparing Python coding for an AI/ML interview?
Two to three weeks of 90-minute daily sessions on these 8 patterns is enough to cover what most interviewers test, provided each session ends with a from-scratch run without reference notes.
A self-paced playground for building with LLMs.
TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.
Try TinkerLLM (₹299 launch)