AI for Engineers

Overfitting and Regularisation: ML Interview Answer Guide 2026

Learn the standard interview answer for overfitting and regularisation: definitions, L1 vs L2, dropout, cross-validation, and a worked example for AI/ML freshers.

By FACE Prep Team 6 min read
machine-learning interview-prep overfitting regularisation ai-interview freshers deep-learning

Overfitting is when a model learns the training data so well it fails on anything new, and every AI/ML fresher interview round opens with that exact question.

The answer interviewers want goes beyond a one-line definition. They expect a definition, at least two techniques to prevent it, and enough depth to discuss trade-offs. This guide gives you that structure, starting from first principles and ending with a worked example you can practise aloud.

What Overfitting Actually Means

The clearest definition for an interview: overfitting happens when a model memorises the training data instead of learning the underlying pattern. Training accuracy is high; test accuracy is low. The model performs well on examples it has seen and fails on examples it has not.

A concrete picture of overfitting is high training accuracy paired with low test accuracy, with a gap between the two that does not close as the model trains longer. The worked example in the final section walks through this scenario step by step with exact numbers.

The opposite failure is underfitting: training accuracy is also low because the model is too simple to capture any pattern at all. Both failures are captured by the bias-variance tradeoff. High variance (overfitting) means predictions change dramatically with small changes in the training data. High bias (underfitting) means the model ignores the data and makes systematic errors in the same direction. Regularisation reduces variance.

Interviewers often ask for a quick framing of the three zones: underfitting (model too simple), the sweet spot (good generalisation), and overfitting (model too complex). A good answer names the zone, names the symptom (training-test gap), and then names the remedy.

The Three Regularisation Techniques Interviewers Expect

When the interviewer asks “what would you do about overfitting?”, they are listening for at least two of these three techniques. Naming all three with a concise explanation of each puts you ahead of most freshers who stop at definitions.

L1 and L2 Regularisation

Both techniques work by adding a penalty term to the loss function. The penalty discourages large weight values, forcing the model to stay simpler.

The difference is in what they penalise:

TechniqueAlso calledPenalty termKey effect
L1LassoSum of absolute weightsDrives some weights to exactly zero; enables feature selection
L2RidgeSum of squared weightsShrinks all weights proportionally; rarely drives any to zero

L1 is preferred when you suspect only a subset of features matters. Weights of irrelevant features collapse to zero, giving you built-in feature selection at no extra cost. L2 is the default for most regression and classification problems where all features are expected to contribute. The scikit-learn documentation on Ridge and Lasso covers the closed-form solutions and convergence conditions for both.

In practice, both are controlled by a hyperparameter called lambda (written λ). A larger lambda applies more penalty and produces a simpler model. A smaller lambda applies less penalty and allows the model to fit more tightly to the training data. Tuning lambda is typically done through cross-validation.

Dropout

Dropout is specific to neural networks. During each training step, the technique randomly deactivates a fraction of neurons. Neurons that are dropped do not participate in the forward pass or backpropagation for that step.

The effect: no single neuron can rely on the presence of any specific neighbouring neuron across steps. This forces the network to learn more distributed, redundant representations that generalise better. The net behaviour is similar to averaging the predictions of many smaller, overlapping networks.

Dropout is not applied at inference time. All neurons are active during prediction, but their outputs are scaled by the keep probability to match the expected magnitude seen during training. The TensorFlow Keras Dropout layer documentation shows the standard implementation: Dropout(rate=0.5) deactivates 50% of inputs at random during training. A rate of 0.2 is typical for input layers; 0.5 is common for fully-connected hidden layers.

Cross-Validation: The Diagnostic, Not Just a Technique

Cross-validation and regularisation solve different parts of the same problem. Regularisation reduces overfitting. Cross-validation tells you how much overfitting you currently have, so you know whether to intervene and how aggressively.

The standard method is k-fold cross-validation. The dataset is split into k equal folds. The model trains on k-1 folds and validates on the remaining fold, rotating k times until every fold has served as the validation set. The k validation scores are averaged to produce a single generalisation estimate.

Why k-fold rather than a simple train-test split? A single split is noisy. Depending on which samples land in the test set, the accuracy estimate can vary by several percentage points in either direction. K-fold averages out that randomness. The scikit-learn cross-validation guide covers the standard implementations, including stratified k-fold for class-imbalanced datasets.

Common k values: 5 is the default in most frameworks. 10 gives a more stable estimate but takes twice the compute. Leave-one-out (k equals n) is thorough but computationally expensive on large datasets.

For an interview answer, the key point is: use cross-validation to confirm and measure the overfitting problem, then tune the regularisation strength (lambda) by comparing cross-validation scores across different values.

The Worked Example: Walking Through a Real Interview Scenario

Practise this scenario before your interview. It covers all three techniques in a natural flow and mirrors what interviewers at companies running structured AI assessments will ask you to walk through.

The setup:

  • You build a convolutional neural network to classify handwritten digits.
  • Training accuracy after 30 epochs: 99.4%.
  • Test accuracy: 72%.

The interview walk-through:

  • Step 1 — Identify: The 27-point training-test gap confirms overfitting. The model has memorised the training set.
  • Step 2 — Diagnose with cross-validation: Run 5-fold CV on the training data. If fold validation scores average around 71 to 74%, the 72% test score is consistent and the model genuinely fails to generalise. If one fold scores 90%, there may also be a data-split issue worth investigating separately.
  • Step 3 — Apply L2 regularisation: Add a weight decay of 1e-4 to the fully-connected layers. Retrain and observe whether the training-test gap narrows without test accuracy dropping further.
  • Step 4 — Add Dropout: Insert Dropout(rate=0.5) after the fully-connected hidden layers. On over-parameterised networks, this typically closes several percentage points of the gap by preventing neurons from co-adapting.
  • Step 5 — Evaluate the result: If test accuracy improves to around 89% and training accuracy settles near 93%, the gap is now within a reasonable range for the dataset size and model complexity.
  • Step 6 — State the remaining lever: If the gap persists, consider data augmentation (rotation, scaling, flipping) or collecting more training examples. Regularisation reduces the model’s tendency to overfit on whatever data it has; more data changes the signal-to-noise ratio at the root.

The worked example matters because it demonstrates a systematic diagnostic process, not just vocabulary recall. Interviewers in structured ML rounds are checking whether you can move from symptom to diagnosis to intervention. That sequence is what the job actually requires.

Once you can explain overfitting and its fixes, the next question the interviewer often asks is to walk through an end-to-end AI project. The walk-me-through-your-AI-project interview answer guide covers how to structure that narrative, including model selection, training choices, and what you iterated on.

How This Question Fits Into AI/ML Interview Rounds

Overfitting and regularisation appear in the opening questions of almost every AI/ML technical round for freshers. It is a filter question: the interviewer uses it to separate candidates with conceptual clarity from those who have surface familiarity from a course.

Companies running structured AI graduate-hire tracks, including TCS, Infosys, Cognizant, and Capgemini, incorporate ML theory questions into their fresher assessments. The framing here is illustrative: these tracks exist across the industry, and overfitting is a canonical topic in all of them. Specific eligibility criteria and selection cutoffs vary by programme and should be verified directly on each company’s careers page.

After the overfitting question, interviewers often follow up with a system-design version: “If you were building this model for production, how would you prevent it from degrading over time?” That question is about retraining schedules, data drift, and monitoring rather than regularisation. The ML system design interview guide for freshers covers that next layer in detail.

The top 30 AI/ML interview questions for freshers maps the full question cluster most interviewers draw from.

Building the AI foundation is the prior step. The 2026 AI roadmap for Indian engineering students lays out the sequence from Python basics through model deployment.

The gap between knowing what L2 regularisation does and being able to show it on a running model is where most interview follow-up questions separate candidates. TinkerLLM is where you close that gap: at ₹299, it gives you live model API calls so you can set lambda, watch the training loss change, and run the worked example above against real data rather than rehearsing it as a script.

Primary sources

Frequently asked questions

What is the difference between overfitting and underfitting?

Overfitting happens when a model performs well on training data but poorly on unseen test data, meaning it memorised noise rather than the underlying pattern. Underfitting is the opposite: the model is too simple to capture the pattern even on training data, showing high error on both sets.

What is L1 vs L2 regularisation in simple terms?

L1 (Lasso) adds the sum of absolute weight values to the loss function, pushing some weights to exactly zero and effectively removing features. L2 (Ridge) adds the sum of squared weights, which shrinks all weights towards zero but rarely to exactly zero. Use L1 when you want sparsity; use L2 for general weight decay.

When should I use dropout instead of L2 regularisation?

Dropout is most effective in large neural networks with many parameters, especially deep CNNs and fully-connected layers. L2 is the standard choice for simpler models including linear regression, logistic regression, and shallow networks. In practice, many deep learning models use both.

What is k-fold cross-validation and why does it help with overfitting?

K-fold cross-validation splits the dataset into k equal parts, trains on k-1 parts, and validates on the remaining part, rotating k times. Averaging the k validation scores gives a more reliable estimate of generalisation than a single train-test split, helping you detect overfitting before deploying the model.

What is the bias-variance tradeoff?

Bias measures how far the model's predictions are from the true values on average; high bias means the model is too simple (underfitting). Variance measures how much predictions change across different training sets; high variance means the model is too complex (overfitting). Regularisation reduces variance at the cost of a slight increase in bias.

How do I answer the overfitting question if I have never built an ML model?

Use the conceptual framing: define overfitting as high training accuracy plus low test accuracy, name at least two regularisation techniques with their mechanism, and describe how cross-validation measures the gap. You do not need a deployed model to give a correct answer; the interviewer is checking conceptual clarity.

Can a model overfit on a small dataset even with regularisation applied?

Yes. Regularisation reduces the risk of overfitting but does not eliminate it. On very small datasets, even a heavily regularised model can overfit if the model capacity exceeds what the data can support. Data augmentation, early stopping, and collecting more data address the root cause that regularisation alone cannot fix.

Build AI projects

A self-paced playground for building with LLMs.

TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.

Try TinkerLLM (₹299 launch)
Free AI Roadmap PDF