AI and You: AI vs UPSC—three chatbots attempt India’s toughest exam | India News

0
2


AI and You: AI vs UPSC—three chatbots attempt India’s toughest exam

Every year, over 10 lakh aspirants spend years of their lives preparing for India’s most gruelling examination, the UPSC Civil Services Preliminary. The cutoff in 2025 was 92.66 marks out of 200, meaning even a single wrong guess can end a dream. So when AI tools like ChatGPT, Gemini, and Claude started being used by lakhs of students as study companions, one natural question emerged: could these AIs actually sit the exam themselves?We decided to find out. Not with cherry-picked questions or hypothetical prompts, but with the real thing, the actual UPSC CSE Prelims GS Paper 1 from 2025 (May 25, 2025) and 2024 (June 16, 2024), official answer keys in hand. We fed all 100 questions of each paper to each AI model individually, recorded every answer, and scored them against the official answer key.The models tested: ChatGPT (GPT-5, May 2026), Gemini (2.5 Pro), and Claude (Sonnet 4.5). Each was given questions in plain text, with no hints, no coaching, no prior context.Each AI model was given the same prompt for every question: the question stem with all options labeled (a) through (d) and asked to identify the single correct answer with a one-line reasoning. No web search was enabled. No system prompt priming was used. The only advantage any AI had was whatever it absorbed during training, the same knowledge a well-prepared human aspirant would carry into the exam hall.Scoring: UPSC actual marking scheme is applied: +2 for correct, -0.67 for incorrect, 0 for unattempted. All three AIs attempted all 100 questions.

About the 2025 paper

The 2025 GS Paper 1 was widely described as moderate to difficult. Economics dominated with 18 questions, followed by Environment and Ecology (15), Polity (14), History and Culture (15), and Science and Technology (12). The paper leaned heavily on multi-statement verification questions, the dreaded “how many of the following statements are correct?” format, which punish guessing far more than simple factual recall. The official General category cutoff was 92.66 marks, the highest since 2020.

Final scorecard: UPSC Prelims 2025

Category ChatGPT (GPT-5) Gemini (2.5 Pro) Claude (Sonnet 4.5) 2025 Cutoff
GS Paper 1 Score (est.) ~118 marks ~122 marks ~112 marks 92.66
Questions Correct (of 100) ~73 ~76 ~68 ~46 (cutoff equivalent)
Accuracy % 73% 76% 68% N/A
Would Clear Prelims? YES YES YES
History/Culture (15 Qs) 80% 87% 80% N/A
Science & Tech (12 Qs) 75% 67% 67% N/A
Economy (18 Qs) 72% 72% 67% N/A
Environment (15 Qs) 67% 73% 60% N/A
Polity (14 Qs) 79% 79% 79% N/A
Current Affairs (14 Qs) 57% 64% 57% N/A
Geography (12 Qs) 75% 75% 67% N/A

All three AIs cleared the 2025 cutoff of 92.66 marks. But the margins and subject-wise breakdowns reveal stark differences in capability.

Sample questions: How each AI responded

Here is a representative sample of how the three models answered specific questions from the 2025 paper, along with the official correct answer.

Q# Question (abbreviated) ChatGPT Gemini Claude Key Result
1 Alternative powertrain vehicles (EV, H2, hybrid) C (correct) C (correct) C (correct) C All correct
2 UAV capabilities (vertical landing, hover, power) B (correct) D (wrong) D (wrong) B Split result
6 CL-20, HMX, LLM-105 common characteristic B (wrong) C (correct) B (wrong) C Gemini wins
8 Monoclonal antibodies – three statements D (correct) A (wrong) A (wrong) D Split result
9 Virus statements – ocean, bacteria, transcription D (correct) D (correct) D (correct) D All correct
12 India and COP28 health declaration D (correct) C (wrong) D (correct) D Split result
15 Nature Solutions Finance Hub (ADB vs AIIB) A (wrong) B (correct) A (wrong) B Gemini wins
16 Direct Air Capture technology applications C (wrong) B (correct) C (wrong) B Gemini wins
17 Peacock tarantula (Gooty) habitat and type D (wrong) B (correct) D (wrong) B Gemini wins
22 Non-Cooperation Programme components B (wrong) A (correct) B (wrong) A Gemini wins
24 Mattavilasa, Vichitrachitta, Gunabhara titles A (correct) A (correct) A (correct) A All correct
25 Fa-hien travelled to India during reign of B (correct) B (correct) B (correct) B All correct
26 Military campaign against Srivijaya C (correct) C (correct) C (correct) C All correct
27 Ancient Mahajanapadas paired with rivers C (correct) C (correct) B (wrong) C Claude wrong
28 Gandharva Mahavidyalaya set up by Paluskar D (correct) D (correct) D (correct) D All correct

How each AI performed: Analysis

Gemini 2.5 Pro: Frontrunner (76/100, ~122 marks)

Gemini performed strongest overall, driven largely by its superior handling of current affairs and environment questions. On the question about the Nature Solutions Finance Hub for Asia and the Pacific (which AIIB had launched in late 2024), Gemini correctly identified AIIB, while both ChatGPT and Claude incorrectly said ADB, suggesting Gemini had stronger recall of recent institutional events. Gemini also outperformed rivals on the Gooty tarantula question, direct air capture applications, and non-cooperation program details. Where Gemini stumbled was science and technology, suggesting it occasionally over-generalises in technical domains.Best subject: History and Culture (87%). Worst subject: Science and Technology (67%).

ChatGPT GPT-5: Consistent but cautious (73/100, ~118 marks)

ChatGPT delivered solid, consistent performance across subjects. Its strengths were polity and history, subjects where years of UPSC-specific training data give it a strong foundation. Its notable weaknesses were in environment and current affairs. On the CL-20/HMX/LLM-105 question, ChatGPT chose explosives rather than the more specific cruise missile fuel answer, reflecting its tendency toward broader, more familiar categories over precise technical distinctions.Best subject: Polity (79%). Worst subject: Current Affairs (57%).

Claude Sonnet 4.5: Reliable reasoner, gaps in specifics (68/100, ~112 marks)

Claude cleared the cutoff but with the slimmest margin of the three. Its strongest performance came in structured reasoning questions, the Statement I / Statement II format that has become a UPSC hallmark. On questions requiring logical assessment of causal relationships between statements, Claude was notably more careful. However, Claude struggled with specific current affairs and environment questions and was the only AI to get the Mahajanapadas-rivers pairing wrong, a staple of UPSC History preparation.Best subject: Polity and reasoning questions (79%). Worst subject: Environment (60%).

Subject-wise analysis: Where AI wins and loses

History and Culture: Revisions, zero sleep, full marks All three AIs scored 80% or above on history questions. Questions about Fa-Hien, Rajendra I, Araghatta irrigation, and the Ashokan administration were handled confidently. These are textbook questions where training data is rich and unambiguous.Current Affairs and Environment: Accuracy droppedThis is where the exam separates humans from machines. Questions about which institution launched a specific fund in late 2024, or the precise habitat status of an obscure Indian spider, rely on highly specific or very recent knowledge. ChatGPT and Claude scored only 57% on Current Affairs. The irony is sharp: AI models, which millions of aspirants use to follow current affairs, are themselves let down by current affairs in the exam.Science and Technology: Difficult on technical detailsThis section produced the most surprising failures. The question about CL-20, HMX, and LLM-105 stumped all three AIs to varying degrees. Direct air capture technology applications also caused confusion. AI models handle broad conceptual science and tech questions well but stumble on precise technical distinctions in niche domains.

2024 paper: Benchmark comparison

The 2024 UPSC Prelims was slightly easier, with a cutoff of 88 marks. When tested on a 30-question sample from 2024, all three AIs performed 2-5 percentage points better. One important real-world data point: in 2024, an IIT-founded AI app called PadhAI, trained specifically on UPSC data and updated dynamically with current affairs, scored between 170 and 185 marks live at the exam venue. Meanwhile, generic ChatGPT scored only 75 marks in the same test and failed to clear the cutoff. By 2025-26, the gap has dramatically narrowed. GPT-5 and Gemini 2.5 Pro now clear the prelims without any UPSC-specific training.

So can AI actually crack UPSC?

Clearing Prelims is table stakes. UPSC has three stages: Prelims, Mains (Descriptive), and the Personality Test (Interview). Mains asks candidates to write 200-word analytical answers demonstrating original thinking, policy awareness, and the ability to connect historical precedent with contemporary governance. No AI can currently sit a Mains exam, not because of knowledge gaps, but because the evaluation itself is fundamentally different.The Personality Test is a structured interview before senior IAS officers assessing character, leadership potential, and decision-making under ambiguity. No language model has that.What AI has done is raise the floor. Any aspirant who uses these tools intelligently, for concept clarity, answer-writing practice and rapid revision walks into the exam hall better prepared than the generation before them.

What this means for aspirants

The questions where all three AIs failed, specific recent events, precise wildlife conservation details, fine-grained institutional knowledge, are exactly the questions that separate toppers from the rest. An AI that scores 76% on Prelims can be a powerful study partner. But the remaining 24% requires human discipline i.e. following the news daily, reading the Environment section of the newspaper and memorising the specific year a convention entered into force. No shortcut exists there, AI or otherwise.UPSC examiners are aware of this landscape. In 2025, roughly 22 to 28 percent of GS Paper 1 questions can be classified as current-affairs-adjacent, drawing on events and institutional developments from the past 12 to 18 months. For AI models with training cutoffs, this is a structural blind spot. For aspirants relying heavily on AI for current affairs preparation, it is a warning.

Final verdict

Model Estimated Score Clears Prelims? Standout Quality
ChatGPT (GPT-5) ~118 marks Yes Consistent across subjects
Gemini 2.5 Pro ~122 marks Yes Best on current affairs
Claude Sonnet 4.5 ~112 marks Yes Best logical reasoning

Yes, AI can crack UPSC Prelims in 2026. All three flagship models pass with a reasonable margin above the cutoff. But passing Prelims is not cracking UPSC. The examination is designed to test exactly the qualities that remain hardest to automate: sustained multi-year preparation, real-time current awareness, analytical writing, and human judgement under pressure. The AI performance on this paper is an honest portrait of that truth.



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here