AI for Engineers

Top 30 AI and ML Interview Questions for Freshers in 2026

The 30 most-asked AI and ML interview questions for Indian engineering freshers in 2026, with clear 3-sentence answer patterns for each.

By FACE Prep Team May 2026 10 min read

ai-interview machine-learning deep-learning fresher-interview placement-prep neural-networks

AI and ML questions are now standard in fresher interview rounds at companies running dedicated AI tracks, not just at research labs.

That shift is measurable. In FY26, AI-skilled graduates made up 60% of TCS’s fresher hires, up from 10 to 15% three years earlier, according to TCS CHRO Sudeep Kunnumal at the AI Impact Summit in March 2026. The interview questions below are what those hiring rounds test. Each answer follows a three-sentence pattern: define the concept, say when it applies, give one concrete detail.

Why AI and ML Questions Now Show Up in Fresher Rounds

Service-tier hiring still runs on aptitude and basic coding. But AI-specific tracks (TCS Digital, Infosys DSE, Cognizant GenC Elevate, Amazon applied-scientist internship) added an explicit AI/ML theory round in 2024-2026 hiring cycles. The questions are not PhD-level. Recruiters use them as a signal that a candidate has gone beyond watching tutorials.

The Stack Overflow Developer Survey 2024 found that 76% of professional developers were using or planning to use AI tools in their work. Freshers who can explain the underlying concepts, not just use the tools, stand out in that context.

Two more things before the questions: most of these rounds are 20 to 30 minutes long; they rarely go deeper than four or five of the 30 questions below. And the answers given here are entry-level. If the interviewer probes further, say what you know and acknowledge the boundary cleanly.

Core Concepts: Foundations of AI and ML (Q1 to Q8)

These eight questions cover the vocabulary every AI/ML round starts with. If you blank on any of these, you are likely to struggle with the algorithm questions that follow.

Q1. What is the difference between AI, ML, and deep learning? Artificial intelligence is the broad goal of building machines that perform tasks requiring human reasoning. Machine learning is a subset that learns patterns from data without being explicitly programmed. Deep learning is a further subset that uses multi-layer neural networks to extract features automatically from raw data.
Q2. What are the three types of machine learning? The three types are supervised learning, unsupervised learning, and reinforcement learning. In supervised learning the model trains on labelled input-output pairs; in unsupervised learning it finds structure in unlabelled data. Reinforcement learning trains an agent by rewarding desirable actions in an environment.
Q3. What is supervised learning? Give an example. Supervised learning trains a model on examples where both the input and the correct output are known, so the model can predict outputs for new inputs. A spam classifier is a standard example: the model learns from emails already labelled “spam” or “not spam.” At inference time it assigns the correct label to emails it has never seen.
Q4. What is unsupervised learning, and how does it differ from supervised? Unsupervised learning finds hidden structure in data that has no provided labels. Customer segmentation — grouping buyers by purchase behaviour without predefined categories — is a typical use case. The core difference from supervised learning is the absence of a target variable to train against.
Q5. What is the bias-variance tradeoff? Bias is the error from oversimplified assumptions in the model; variance is the error from sensitivity to small fluctuations in training data. High bias causes underfitting; high variance causes overfitting. The tradeoff is finding a model complexity where both error sources are acceptably low.
Q6. What is overfitting, and how do you prevent it? Overfitting happens when a model memorises the training data instead of learning general patterns, performing well on training examples but poorly on new data. Common prevention techniques include regularization (L1 or L2), dropout in neural networks, and cross-validation. Collecting more training data is also effective when available.
Q7. What is cross-validation? Cross-validation estimates model performance on unseen data by splitting the dataset into multiple folds and rotating which fold is used for testing. In k-fold cross-validation the model trains on k-1 folds and tests on the remaining fold, repeating k times. The average test score across all folds is a more reliable performance estimate than a single train-test split.
Q8. What is the difference between classification and regression? Classification predicts a discrete class label — for example, whether a loan application is approved or rejected. Regression predicts a continuous numerical value — for example, a predicted salary or a stock price. The choice of loss function and evaluation metric differs between the two.

Machine Learning Algorithms (Q9 to Q16)

Interviewers use algorithm questions to check whether you understand the mechanics, not just the names. For each algorithm below, be ready to say what it optimises and when you would choose it over alternatives.

Q9. How does linear regression work? Linear regression fits a line (or hyperplane in multiple dimensions) through data points by minimising the sum of squared differences between predicted and actual values. The model learns a weight for each input feature during training. It assumes a linear relationship between inputs and the target, is computationally efficient, and is interpretable.
Q10. What is logistic regression and when do you use it? Despite the name, logistic regression is a classification algorithm. It applies the sigmoid function to a linear combination of features, producing a probability between 0 and 1. Use it when the target is binary (yes/no, fraud/not fraud) and you need an interpretable, probabilistic output.
Q11. How does a decision tree split data? A decision tree chooses at each node the feature and threshold that maximises information gain (or minimises Gini impurity) and splits the data accordingly. It continues splitting recursively until it hits a stopping condition such as minimum leaf size or maximum depth. The result is a tree of if-else rules that any non-technical stakeholder can read.
Q12. What is a random forest, and how is it different from a single decision tree? A random forest is an ensemble of many decision trees, each trained on a random bootstrap sample of the data with a random subset of features at each split. Individual trees tend to overfit; averaging many uncorrelated trees reduces variance and improves generalisation. Random forest consistently outperforms a single tree on tabular data, at the cost of interpretability.
Q13. What is gradient descent? Gradient descent is an optimisation algorithm that updates model parameters iteratively in the direction that decreases the loss function. At each step it computes the gradient of the loss with respect to every parameter and subtracts a fraction (the learning rate) of that gradient. Variants like stochastic gradient descent (SGD) and Adam process mini-batches rather than the full dataset, which speeds up training on large data.
Q14. What is a support vector machine (SVM)? An SVM finds the hyperplane that maximally separates two classes in feature space, maximising the margin between the nearest data points (the support vectors). The kernel trick maps data into a higher-dimensional space, allowing SVMs to handle non-linearly separable cases. SVMs work well on small, high-dimensional datasets but become slow on large ones.
Q15. What is k-nearest neighbours (KNN)? KNN classifies a new point by finding the k closest training points by distance (usually Euclidean) and taking a majority vote of their labels. It stores the entire training set and computes distances at inference time, making prediction slow on large datasets. The value of k controls the tradeoff: small k increases variance, large k increases bias.
Q16. What is k-means clustering? K-means partitions n data points into k clusters by minimising the sum of squared distances from each point to its cluster centroid. The algorithm alternates between assigning each point to the nearest centroid and recomputing centroids until assignments stabilise. Choosing the right k is non-trivial; the elbow method plots inertia against k values to find a natural bend.

Deep Learning and Model Evaluation (Q17 to Q30)

The questions in this section are more common in AI-specialist fresher tracks than in general service-tier hiring. Cover Q17 to Q23 thoroughly if you are targeting any role with “AI,” “ML,” or “data” in the job title.

Q17. What is a neural network? A neural network is a stack of layers, each containing units (neurons) that compute a weighted sum of their inputs and pass the result through an activation function. The input layer receives features; hidden layers learn increasingly abstract representations; the output layer produces the final prediction. The network learns its weights by minimising a loss function via gradient descent and backpropagation.
Q18. What is an activation function? Name common ones. An activation function introduces non-linearity into the network, allowing it to model complex patterns beyond a single linear transformation. Without it, stacking multiple layers is mathematically equivalent to a single linear transformation. Common activation functions include ReLU (Rectified Linear Unit), Sigmoid, Tanh, Leaky ReLU, and Softmax (for multi-class output layers).
Q19. What is backpropagation? Backpropagation computes the gradient of the loss with respect to every weight in the network by applying the chain rule of calculus from the output layer back to the input layer. These gradients tell gradient descent how to adjust each weight to reduce the loss. It is the core training algorithm for virtually all neural networks.
Q20. What is a convolutional neural network (CNN)? A CNN uses convolutional layers that apply learnable filters across input data (typically images) to detect local patterns such as edges, textures, and shapes. Pooling layers then down-sample the feature maps, reducing spatial dimensions while retaining the most important activations. CNNs achieve state-of-the-art performance on image classification, object detection, and similar vision tasks.
Q21. What is a recurrent neural network (RNN)? An RNN processes sequential data by maintaining a hidden state that carries information across time steps, allowing it to model temporal dependencies in sequences. It is suited for tasks like language modelling, time-series forecasting, and speech recognition. Vanilla RNNs have largely been replaced in practice by LSTM (Long Short-Term Memory) networks and transformers, which handle long-range dependencies more reliably.
Q22. What is a transformer architecture? Transformers process sequences using self-attention: each token attends to every other token simultaneously rather than sequentially, enabling parallel computation and capturing long-range dependencies. Introduced in the 2017 paper “Attention Is All You Need” by Vaswani et al., transformers are the foundation of modern large language models. They have since been adapted for vision (Vision Transformers) and other modalities.
Q23. What is transfer learning? Transfer learning reuses a model pre-trained on a large dataset as the starting point for a new task that has limited labelled data. Instead of training from random weights, you fine-tune the pre-trained model on your specific dataset, which is faster and often more accurate than training from scratch. BERT fine-tuned for sentiment classification is a textbook example.
Q24. What is regularization, and what are L1 and L2? Regularization adds a penalty to the loss function to discourage the model from learning overly complex patterns, which reduces overfitting. L1 regularization (Lasso) adds the sum of absolute values of weights, which tends to push some weights to exactly zero and effectively performs feature selection. L2 regularization (Ridge) adds the sum of squared weights, which shrinks all weights proportionally but rarely zeroes them out.
Q25. What is a confusion matrix? A confusion matrix is a table that summarises classification model performance by showing counts of true positives, true negatives, false positives, and false negatives. It reveals where the model is making errors in a way that a single accuracy number hides. From the matrix you can derive precision, recall, F1 score, and other task-specific metrics.
Q26. What are precision, recall, and F1 score? Precision is the fraction of predicted positives that are truly positive; it answers “when the model says yes, how often is it correct?” Recall is the fraction of actual positives the model correctly identifies; it answers “of all the real positives, how many did the model catch?” F1 score is the harmonic mean of precision and recall, used when the two metrics trade off against each other.
Q27. What is feature engineering? Feature engineering transforms raw data into input representations that improve model performance beyond what the raw features produce. Examples include creating interaction terms, encoding categorical variables (one-hot, target encoding), scaling numerical features, and extracting date-time components. Good feature engineering often improves a simple model more than switching to a complex one.
Q28. What is the curse of dimensionality? As the number of features grows, the volume of the input space grows exponentially, so data points become increasingly sparse. In high dimensions, the concept of a nearest neighbour loses meaning because all points become roughly equidistant. Dimensionality reduction techniques like PCA (Principal Component Analysis) or feature selection address this directly.
Q29. What is a large language model (LLM)? An LLM is a transformer-based neural network trained on hundreds of billions of tokens of text to predict the next token in a sequence. At scale, this training produces emergent capabilities: the model can answer questions, summarise text, translate languages, and write code. GPT-4, Gemini, and Claude are examples used in commercial and research applications.
Q30. What is the difference between a generative model and a discriminative model? A discriminative model learns the boundary between classes: it models P(y given x), the probability of a label given the input. A generative model learns the underlying data distribution P(x, y), which allows it to generate new samples as well as classify. LLMs and image generators are generative; most classifiers (logistic regression, SVM) are discriminative.

From Interview Prep to Shipped Projects

Knowing these 30 answers fluently gets you through the theory round. It does not, on its own, get you an offer on an AI-specialist track. Recruiters running those rounds have noted that candidates who can recite definitions but cannot explain a concrete project quickly expose the gap.

The AI roadmap for Indian engineering students covers what to build after you have the theory. The short version: two end-to-end projects on a public GitHub count for more than any combination of certificates. One project using a classical ML algorithm (start with Q9 to Q16 above) and one using an LLM or fine-tuned model (Q22 to Q23) covers the range most AI-track interviewers want to see.

If you’re clear on what AI and ML actually are but have not yet written code that calls a real model API, the gap between knowing Q29 (LLMs) and having shipped something with one is exactly the gap TinkerLLM closes. For ₹299 you get access to real LLM API calls in a guided environment. The micro-project you build is what you show the next time a recruiter asks what you have actually shipped.

Primary sources

Frequently asked questions

Do freshers actually get deep learning questions in campus interviews?

Yes, but selectively. Service-tier roles (TCS Ninja, Infosys System Engineer) rarely go beyond supervised/unsupervised basics. AI-specific tracks like TCS Digital or Infosys DSE test neural networks, transformers, and evaluation metrics at depth.

How long should my answers be in an AI/ML fresher interview?

Aim for 60 to 90 seconds per question. Three clear sentences work well: definition, when it applies, one concrete example. Longer answers without a clear structure tend to lose the interviewer.

Is Python required for AI/ML fresher interviews in India?

Python is the dominant language for AI/ML roles. Expect at least one coding exercise in Python, even in theory-heavy interviews. Familiarity with NumPy, Pandas, and scikit-learn is expected at the fresher level.

Are there coding questions in AI/ML fresher rounds, or only theory?

Both. Theory questions test whether you understand concepts; a short coding task (implement a cost function, write a basic gradient descent loop, use scikit-learn to train a classifier) tests whether you can apply them.

What tools or projects strengthen an AI/ML fresher profile?

A public GitHub with two to three complete notebooks showing end-to-end work -- data cleaning, model training, evaluation -- is more convincing than certificates. Even a small project using a real dataset and a documented write-up signals readiness.

How should I prepare for the AI round in TCS Digital or Infosys DSE?

Cover the 30 questions in this article first. Then practise coding a logistic regression and a neural network from scratch in Python. Finally, read the official company careers page for the role description -- it often lists the skills tested.

Build AI projects

A self-paced playground for building with LLMs.

TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.

Try TinkerLLM (₹299 launch)

Share WhatsApp LinkedIn Twitter