Tokens in C: Keywords, Identifiers, Constants, and Operators Explained
Every C program breaks down into six token types. Here is what each one does, the rules that govern each, and the mistakes that trip up students in placement tests.
A token is the smallest meaningful unit of a C program, the atomic piece the compiler works with before it understands anything else about your code.
Six categories cover every token in C: keywords, identifiers, constants, strings, special symbols, and operators. Get these wrong and the compiler never reaches your logic. Get them right and you can catch an entire class of placement-test traps before they trip you.
What Is a Token in C?
When a C compiler reads source code, the first step is lexical analysis: breaking the raw text stream into tokens. Before the compiler checks whether your logic makes sense, it checks whether each token is valid. This is why a misspelled keyword produces a compile error before any runtime logic runs.
Consider this one-liner:
int sum = a + b;
The lexer sees seven tokens: int (keyword), sum (identifier), = (assignment operator), a (identifier), + (arithmetic operator), b (identifier), and ; (special symbol). Each belongs to exactly one token type. The compiler applies different parsing rules to each type, which is why int and sum look identical to a human but behave very differently to the compiler.
Keywords: C’s 32 Reserved Words
Keywords are words the C language has claimed for itself. The compiler assigns each a fixed meaning, and no program may reassign that meaning by using a keyword as a variable or function name. All 32 ANSI C keywords are lowercase:
| Category | Keywords |
|---|---|
| Data types | auto, char, double, float, int, long, short, signed, unsigned, void |
| Storage class | extern, register, static |
| Control flow | break, case, continue, default, do, else, for, goto, if, return, switch, while |
| Type qualifiers | const, volatile |
| Type definition | enum, struct, typedef, union |
| Size operator | sizeof |
That accounts for all 32. C99 added restrict, inline, _Bool, _Complex, and _Imaginary. C11 added a further seven (_Alignas, _Alignof, _Atomic, _Generic, _Noreturn, _Static_assert, _Thread_local). Placement MCQs almost always reference the C89 count of 32.
The Case-Sensitivity Trap
Keywords in C are strictly lowercase. int is a keyword. Int is a valid identifier. INT is also a valid identifier. This single rule generates at least two or three questions on every C-heavy aptitude test:
- Q: Which of these is NOT a keyword in C? Options:
int,FLOAT,break,void. - Answer:
FLOATis not a keyword.float(lowercase) is.FLOATis a valid user-defined identifier.
If you see an option that looks like a keyword but has any uppercase letter, it is an identifier, not a keyword.
Identifiers: The Naming Rules You Cannot Skip
Identifiers are programmer-chosen names for variables, functions, arrays, and labels. The compiler accepts an identifier only if it satisfies four rules. Per the C identifier specification:
- Must begin with a letter (
a-z,A-Z) or_ - After the first character, may contain only letters, digits (
0-9), or_ - Must not match any keyword exactly (case-sensitive comparison)
- Must not contain whitespace or any other special character
In C89, only the first 31 characters of an identifier are significant. Two names that are identical in the first 31 characters but differ after that are treated as the same identifier. C99 extends this to 63 characters for identifiers with internal linkage.
| Identifier | Valid? | Reason |
|---|---|---|
count | Valid | Starts with letter, all alphanumeric |
_total | Valid | Leading _ is allowed |
student_2026 | Valid | _ and digits after the first character are fine |
2fast | Invalid | Starts with a digit |
my-var | Invalid | Hyphen is not a permitted character |
float | Invalid | Matches the keyword float exactly |
MAX_SIZE | Valid | Uppercase letters and _ allowed |
Identifiers are case-sensitive. count, Count, and COUNT are three distinct identifiers. This is a common source of bugs at test time. For a systematic look at where naming and scoping mistakes surface in submitted code, the common C programming errors guide covers the patterns that appear most in campus assessments.
Constants and Strings
Constants
A constant holds a fixed value that cannot change during execution. C supports four constant types:
- Integer constants: Whole-number values in decimal (
42), octal (prefix0, e.g.,052equals 42), or hexadecimal (prefix0x, e.g.,0x2Aequals 42). - Floating-point constants: Written as
3.14(decimal) or3.14e2(exponential notation for 314.0). - Character constants: A single character in single quotes (
'A'). Stored as its ASCII integer value —'A'equals 65,'0'equals 48. - Enumeration constants: Named integer values declared with
enum, such asenum Color { RED, GREEN, BLUE };.
The const keyword and #define directive are the two mechanisms for enforcing constants in practice:
#define PI 3.14159
const int MAX = 100;
#define performs text substitution before compilation (no type, no memory). const creates a typed, memory-resident variable that the compiler protects from modification. The distinction matters for pointer arithmetic and for type-checked assignments: you can create a const int * pointer (pointer to a constant int), which #define does not support.
Strings
A string in C is a sequence of characters stored in a char array, terminated by a null character ('\0'). String literals use double quotes:
char name[] = "FACE";
The array name holds five characters: 'F', 'A', 'C', 'E', '\0'. The null terminator is added automatically for string literals. Functions such as strlen() and printf() with %s stop processing when they encounter '\0'.
The distinction to know for tests:
'A'is a character constant (typeint, value 65)"A"is a string literal (achararray:{'A', '\0'}, two bytes in memory)
These are not interchangeable. Assigning a string literal to a char variable instead of a char[] is a common error that the compiler will warn about.
Special Symbols
Special symbols are non-alphanumeric characters assigned fixed syntactic roles by the C language. They are not operators (they do not compute a result), but each one changes how the compiler parses what surrounds it.
| Symbol | Role |
|---|---|
[] | Array subscript — arr[i] accesses element at index i |
() | Function call and grouping — printf(...), (a + b) * c |
{} | Block delimiter — marks the start and end of a compound statement |
, | Separator — separates function arguments and multiple declarations |
; | Statement terminator — ends every executable statement |
* | Pointer declaration and dereference (context-dependent) |
= | Assignment — copies the right-hand value into the left-hand variable |
# | Preprocessor directive marker — #include, #define, #ifdef |
The * symbol deserves a note: in a declaration it marks a pointer type (int *p), and in an expression it dereferences a pointer (*p = 10). Same character, two distinct roles depending on syntactic context. This ambiguity is a classic multiple-choice setup. The pointers and arrays in C guide covers the full set of pointer contexts in which * appears.
Operators
An operator is a symbol that triggers a computation on one or more values. Those values are called operands. C classifies operators by how many operands they require.
Unary Operators (one operand)
| Operator | Name | Example |
|---|---|---|
++ | Increment | i++ (post-increment), ++i (pre-increment) |
-- | Decrement | i-- (post-decrement), --i (pre-decrement) |
- | Unary minus | -x negates x |
! | Logical NOT | !flag evaluates to 1 if flag is 0 |
~ | Bitwise NOT | ~mask flips all bits |
* | Dereference | *ptr reads the value at the address in ptr |
& | Address-of | &var returns the memory address of var |
sizeof | Size in bytes | sizeof(int) returns 4 on most 32-bit systems |
Binary Operators (two operands)
Binary operators take two operands. They divide into sub-categories:
- Arithmetic:
+,-,*,/,%(modulo remainder) - Relational:
==,!=,<,>,<=,>=. Return 0 (false) or 1 (true). - Logical:
&&(AND),||(OR). Short-circuit: the right operand is not evaluated if the left determines the result. - Bitwise:
&,|,^(XOR),<<(left shift),>>(right shift). Operate on individual bits. - Assignment:
=(simple), plus compound forms+=,-=,*=,/=,%=,&=,|=,^=,<<=,>>=.
Ternary Operator (three operands)
C has exactly one ternary operator: the conditional ?:.
int max = (a > b) ? a : b;
The condition (a > b) is evaluated. If true, the expression returns a; if false, it returns b. The ternary operator produces a value, which is what distinguishes it from an if-else statement.
Operator Precedence Traps
Precedence determines which operations evaluate first when several operators appear together. Three traps appear in placement tests with consistent frequency:
*p++increments the pointer (moves to the next memory address), not the value atp. To increment the value, write(*p)++.- Bitwise
&has lower precedence than==. The expressionx & mask == 0parses asx & (mask == 0), almost never what the writer intended. Use parentheses:(x & mask) == 0. - In C,
&&has higher precedence than||. Unlike some other languages where the two are equal,a || b && cin C parses asa || (b && c).
For worked examples that use these operator rules in placement test format, the C programming interview questions list includes operator-precedence problems with full derivations.
Tokens and the Compiler Pipeline
Understanding token types converts MCQ elimination from guesswork into a rule check. Consider these three lines:
int register = 5; /* register is a keyword: compile error */
int 2count = 0; /* starts with digit: compile error */
int max = 'A' + 1; /* valid: 'A' is 65, max becomes 66 */
In each case, classify every word and symbol by token type. If any token violates the rules for its type, the line fails before the program runs. Scanning options this way rules out one or two MCQ distractors in under ten seconds.
Knowing token categories also clarifies why certain C behaviours feel surprising. When output differs from what the code seems to say, the reason is often operator precedence or a constant type mismatch, both of which reduce to token classification at their root.
Where C Tokens and AI Tokens Meet
The token concept in C did not stay inside compilers. Large language models also process text as tokens, though the definition shifts: an LLM token is a sub-word chunk, and the tokenizer maps raw text to vocabulary IDs the same way a C lexer maps source characters to typed tokens. Every LLM API prices calls by token count for this reason, and inference speed scales with sequence length in tokens, not characters.
TinkerLLM lets you run live LLM API calls and inspect the tokenizer output directly. At ₹299, it puts actual token-count data in your hands: type a C code snippet into a prompt and watch how the tokenizer splits keywords, identifiers, and operators into its own vocabulary units. It is a concrete way to see that the lexical analysis idea C introduced in the 1970s still drives how the most powerful AI models read text today.
Primary sources
Frequently asked questions
How many keywords are in ANSI C?
ANSI C (C89) defines 32 reserved keywords. C99 added 5 more: restrict, inline, _Bool, _Complex, and _Imaginary. Most placement test MCQs reference the 32 ANSI C keywords, so that count is the one to memorise.
Can a C identifier start with an underscore?
Yes, identifiers may start with an underscore. However, identifiers beginning with an underscore followed by an uppercase letter or another underscore are reserved for the standard library. Avoid that pattern in your own code to prevent conflicts.
What is the difference between a constant and a variable in C?
A variable holds a value that can change during execution. A constant defined with const or #define holds a fixed value; the compiler prevents any code from modifying it after definition.
What is the difference between a string literal and a character constant in C?
A character constant is a single character in single quotes, like 'A', stored as its ASCII integer value. A string literal is a sequence of characters in double quotes, like "FACE", stored as a null-terminated char array with one extra byte for the '\0' terminator.
What are the types of C operators by arity?
Unary operators act on one operand (++, --, !, ~, sizeof). Binary operators act on two operands and include arithmetic, relational, logical, bitwise, and assignment sub-types. The ternary operator ?: acts on three operands and is the only one of its kind in C.
Is 'int' a keyword or an identifier in C?
int is a keyword, one of the 32 reserved words in ANSI C. It cannot be used as a variable name, function name, or any other identifier. Note that INT and Int are valid identifiers because C keywords are all lowercase.
A self-paced playground for building with LLMs.
TinkerLLM is FACE Prep's sister property. A guided environment for shipping real LLM applications, the kind of project that earns a paragraph on your resume, not a line.
Try TinkerLLM (₹299 launch)