Download the ful PDF version of the article here.
EPSO's AD5 Dilemma:
Victim of Their Own Success?
More than 174,000 candidates applied for the EPSO AD5 Graduate (Generalist) exam — several times the originally projected number. This creates an extremely difficult situation for the AD5 Selection Board, EPSO, and their test provider (OAT), touching on technical, legal, fairness, and operational questions that do not have easy answers.
With thanks to Rita Revy, László Zlatarov, Ben Williams, Adam Idrissi, John Harper, Luc Gillis, and others for their helpful contributions and review.
EPSO must decide how to run a competition designed for 50,000 candidates — with more than 174,000 registered.
Under the current Notice of Competition, all candidates sit all tests — including the EUFTE, which is scored by human assessors. Running this at 120,000+ likely participants (174,000+ registered, but accounting for expected no-shows), spread across many testing days, creates severe operational, legal, and fairness challenges. Any form of staged progression requires a NoC amendment. Without one, the current rules apply — but they were never designed for this volume. Two amendment paths exist: staged thresholds (Path A) or a ranking-based funnel (Path B). Both carry risks. Neither is without political cost. Every week without a decision narrows the options further.
⟷
Scale
The tests best suited to selecting the right candidates are the hardest to run at this volume. Reducing the load means changing what gets measured — and when.
⟷
Feasibility
EPSO is required to treat all candidates equally — which, under the current rules, means having everyone sit all tests. But running all tests for 120,000+ people creates fairness risks of its own, across test days, IT capacity, and human assessor load.
⟷
Operational reality
Staying within the current rules minimises legal risk but may be operationally undeliverable. Amending the rules opens operational options — but creates new legal and political exposure.
Section 01
EPSO's Dilemma
EPSO's core mission is to act as a trusted matchmaker between EU institutions and talented professionals and graduates — selecting the right people to build the current and future European civil service. The AD5 Graduate (Generalist) competition is the main gateway for that mission: open to anyone holding a university degree in any field, it has historically been EPSO's flagship generalist competition. This time around, however, EPSO is dealing with a "scale" problem it was not designed for. More than 174,000 candidates registered — several times the originally projected number of around 50,000 (with 80,000 considered an absolute upper-end outlier) — and actual participants are likely to remain well above 120,000 even with no-shows. (Some of the reasons behind the surge are analysed here.) This is not just "more candidates": the problems it creates grow disproportionately with size, with consequences across every dimension of the competition.
Five Tensions EPSO is Caught Between
Caught
Section 02
Description of the Dilemma
If you strip it down, EPSO is trying to achieve five things simultaneously: fulfil its mission as matchmaker by selecting the most suitable and capable future EU officials from a very large pool; channel 174,000+ candidates into a manageable process; do so fairly across multiple test days; avoid system failure and operational overload; and minimise legal risk. These objectives pull in different directions.
The core problem is straightforward: under the current rules, all 174,000+ candidates would sit all tests — including the EUFTE. Running that at this scale, across many testing days, is the central operational challenge EPSO faces.
The Scale Challenge: Current Model vs. a Funnel Approach
registered
candidates
→
likely actual
participants
vs.
originally
projected
→
reserve list
target
in batches over many testing days
verbal + EU Knowledge + Digital Skills ranked; N+A threshold only
Target: 1,490
calibrating thresholds to hit this is very hard.
~120,000 candidates
5,000–9,000 candidates
smallest group
Manageable, lower legal risk, and practical.
Under the current rules, all candidates sit all tests. However, not all tests carry equal weight in determining outcomes: numerical and abstract reasoning are threshold-only tests (pass or fail), while verbal reasoning, EU Knowledge, and Digital Skills feed into a preliminary combined score. Candidates must first pass all per-test thresholds — then they are ranked in descending order of that combined score. The EUFTE scripts are then evaluated and eligibility checked in parallel, for a group that in principle does not exceed 1.5× the number of successful candidates sought (roughly 2,235) — though the Selection Board may increase this number where candidates are tied on equal scores. At 174,000+ candidates spread across many testing days, this creates several serious problems:
- The EUFTE (Free-Text Essay on EU Matters, which assesses written communication skills and is, for now, scored by human assessors): under the current model, all candidates sit and submit the EUFTE — but only a limited number of responses are actually evaluated. The current NoC aims to limit the number of EUFTEs to be scored to around 1.5 times the number of reserve list places — roughly 2,235 candidates. That target makes sense operationally. The problem is achieving it: under a threshold-based model, it is very hard to calibrate how many candidates will end up ranked high enough to have their EUFTE response evaluated. Thresholds set too low mean too many candidates qualify for evaluation; thresholds set too high risk too few qualifying, which can invite legal challenges or criticism that the questions were unreasonably difficult
- The EU Knowledge Test: at this volume, running it across 10-12 days creates real fairness risks. Even if the question pool is large enough to avoid repeating specific questions across sessions (30 questions × 12 days = 360 unique questions is achievable), candidates sitting later sessions could still benefit from what earlier test-takers share — not necessarily the exact questions, but the topics covered, the policy areas emphasised, the style of questions, and the level of difficulty. That kind of intelligence, even without verbatim content, is genuinely useful preparation and creates an uneven playing field. It also risks penalising well-suited generalist candidates through niche questions on specific policy areas
- IT and operational load: all candidates must be processed across all tests, maintaining fairness across many testing days, with all the associated IT, helpdesk, and proctoring (the online supervision system used to ensure exam integrity during remote testing) demands
| Test | Language | Questions | Pass score | Preliminary ranking | Final score |
|---|---|---|---|---|---|
| Verbal reasoning | Language 1 | 20 | 10/20 | 40% | 35% |
| Numerical + Abstract reasoning | Language 1 | 10 + 10 | 10/20 combined | Not included | Not included |
| EU Knowledge Test | Language 2 | 30 | 15/30 | 30% | 25% |
| Digital Skills Test | Language 2 | 40 | 20/40 | 30% | 25% |
| EUFTE | Language 2 | — | 5/10 | Not included | 15% |
Under the current NoC, the ranking that determines whose EUFTE gets marked is based on a composite preliminary score: verbal reasoning (40%) + EU Knowledge Test (30%) + Digital Skills Test (30%). This composite produces a far wider range of possible score values than any single test would — so the tie problem that would afflict a single-test ranking is much less acute here.
The real problem is simpler and more fundamental: under a threshold-based model, EPSO has no way to fix the number of candidates who will pass all the thresholds before the tests are actually sat. Note that under the current model all candidates sit and submit the EUFTE — but only those ranked high enough in the composite scoring have their response actually evaluated and their eligibility checked, in parallel. The NoC sets this group at in principle no more than 1.5× the reserve list target (~2,235), with the Selection Board able to extend it if candidates are tied on equal scores. Thresholds on verbal, EU Knowledge, and Digital Skills determine who enters the composite ranking — but small shifts in average scores, or unexpected changes in candidate volume, can move the number of threshold-passers by tens of thousands. If too few pass the thresholds, the EUFTE evaluation target is undershot; if far too many pass, the operational burden expands unpredictably.
An illustrative scenario: Imagine 120,000 candidates sit all tests and, say, 40,000 pass all three thresholds and enter the preliminary composite ranking. The composite score then ranks those 40,000 and cuts to the top ~2,235 — this works well in principle. But EPSO cannot know in advance that 40,000 will pass. If thresholds are set too low, many more pass and the workload expands; if set too high, too few pass and the process is exposed to legal challenge. The volume reaching the ranking stage is unpredictable — and that unpredictability is the core problem.
This is one of the strongest arguments for a ranking-based funnel at Stage 1: instead of relying on thresholds to control how many reach the composite ranking, a hard cap fixes the number in advance regardless of score distributions.
The root issue is that the current rules were not designed for this volume. At the end of the process, EPSO's goal, and its institutional mission, is to place the most suitable candidates on the reserve list: a pool of pre-approved candidates (target: 1,490) from which EU institutions can recruit over the following years, building the European civil service one competition at a time.
Section 03
Three Possible Approaches
The current model was designed for a much smaller candidate pool and has no staged elimination — everyone sits everything. Historically, all tests took place on the same day for all candidates. Running them in batches over multiple days is an approach EPSO has indicated is possible without a NoC amendment, but no details on how it would work in practice have been published. What does require a NoC amendment is any form of staged progression — where results from one test determine who sits the next. There are two broad directions such an amendment could take.
A key distinction runs through both paths: thresholds and ranking are not interchangeable. A threshold tells you the minimum a candidate must achieve — but it cannot tell you in advance how many will clear it. A 1–2 percentage point shift in average scores can mean tens of thousands more or fewer candidates proceeding. Ranking, by contrast, fixes the number in advance: the top N proceed, regardless of score distribution. This difference matters enormously at this scale, and it is why Path B offers fundamentally better volume control than Path A.
Three Possible Approaches: Current Model, Staged Thresholds, or Ranking Funnel
Without amending the Notice, EPSO:
With a NoC amendment, EPSO could:
Path A (staged thresholds) would split the competition into two parts: all candidates sit Part 1 (reasoning tests); only those who pass the Part 1 thresholds are then invited to sit Part 2 (EU Knowledge, Digital Skills, and EUFTE). There is no ranking at Part 1 — elimination is purely threshold-based. This reduces the Part 2 load substantially. However, it faces the same calibration problem as the current model — only more acutely:
- The NoC aims to limit EUFTE evaluation to around 1.5 times the reserve list places (roughly 2,235 candidates). Under Path A, only candidates who pass Part 1 sit Part 2 at all — so they are the only ones who sit and submit the EUFTE. Among those, only the ones ranked high enough in the Part 2 composite scoring have their EUFTE response actually evaluated. Under a threshold-based approach, hitting the ~2,235 evaluation target is still very hard: thresholds set too low mean too many qualify; too high risks too few qualifying, opening the door to legal challenge or criticism of overly difficult tests
- Volume proceeding to later tests would remain hard to predict upfront, since thresholds cannot fix exact numbers in advance
- Raising thresholds significantly could bring the numbers closer to the target, but creates its own problems: the higher the threshold, the more candidates cluster near the cut-off, and the greater the risk of borderline candidates passing or failing on random factors rather than ability — adding legal and fairness risks rather than reducing them
Path B (ranking funnel) would involve a more significant design change:
- A targeted NoC amendment introducing ranking at Stage 1, a hard cap on how many proceed, and flexibility in how scores are combined
- The legal changes would be relatively modest; the operational benefit could be substantial
- It carries lower legal risk from a testing design perspective than either a threshold-only approach or the current model at this scale
Section 04
The Case for Ranking on Reasoning Skills
Both Path A and Path B require a NoC amendment and both involve applying reasoning test scores to filter who sits later tests. The question this section explores is more specific: under the current NoC, verbal reasoning already contributes 40% to the preliminary ranking. Should numerical and abstract reasoning also contribute to ranking, rather than remaining purely pass/fail? Bringing N+A into a composite alongside verbal would produce a more granular reasoning score — and that composite is the natural candidate for driving Stage 1 selection in a revised model.
Reasoning skills tests (verbal, numerical, and abstract reasoning) look like the strongest candidate for a strengthened Stage 1 ranking role:
- Professionally designed question banks — they draw on question banks that are designed and checked to measure reasoning ability consistently; critically, these banks are large enough to run reliably across 10-12 days of testing at 10,000-12,000 candidates per session without the question security risks that affect the EU Knowledge Test at this scale
- Standardised and scalable — fully standardised, deliverable remotely, and proven at very large candidate numbers; at 120,000+ participants across many test days, any instrument that cannot be delivered consistently and remotely is not a realistic option
- Generally equitable — more so than less structured methods such as unstructured interviews. One caveat worth noting: some psychometric literature raises concerns about whether verbal reasoning tests in a second or weaker language may understate a candidate's actual reasoning ability. Under the current NoC, verbal reasoning is sat in Language 1, so the concern relates specifically to candidates for whom their Language 1 is not their most fluent language. This is not an established finding specific to EPSO's tests, but it is a consideration worth keeping in mind when setting test weights.
What about the EU Knowledge Test? Under Path A (staged thresholds), the current preliminary ranking formula would be preserved: verbal reasoning (40%), EU Knowledge (30%), and Digital Skills (30%) combined — with EU Knowledge contributing one component of that ranking, not acting as the sole instrument. Under Path B (ranking funnel), EU Knowledge would move to Stage 2 with a much smaller candidate pool. Using it for ranking at Stage 1, across the full candidate pool, would be problematic:
- Running it across 10-12 days creates real fairness risks even if specific questions are not repeated — the topics covered, policy areas emphasised, and style of questions are themselves useful intelligence for candidates sitting later sessions, creating an uneven playing field that is hard to prevent at this scale
- Niche questions risk penalising well-suited generalist candidates who have not worked in a specific policy area
- EU knowledge at the point of recruitment is generally considered a weaker predictor of long-term job performance than reasoning ability, particularly for generalist roles where on-the-job learning is expected
Situational Judgment Tests (SJTs) and e-tray exercises are also worth a mention. They tend to add meaningful predictive validity for job fit — contributing incremental value over cognitive tests alone in predicting how someone will actually perform in the role — and could in principle play a role at Stage 1. However, reintroducing them would require significant changes to the current Notice of Competition framework, and would probably not be politically viable in the near term.
The real question, therefore, is not which tests to use at Stage 1, but how to score the reasoning tests (verbal, numerical, and abstract) within a revised model. For context, the current NoC sets the verbal reasoning pass score at 10/20 (50%) and numerical + abstract combined at 10/20. Under Path A (staged thresholds), EPSO might consider raising these — for example, verbal to 14/20 or 15/20 — to reduce the number proceeding to later tests. The challenge, as discussed above, is that small changes in pass score can produce large and unpredictable swings in candidate numbers at this scale. Four broad options for how the reasoning tests at Stage 1 could be scored:
Scoring Options for the Reasoning Tests (Verbal + Numerical + Abstract) at Stage 1: A Comparison
| Option | Mechanism | Volume Control | Legal Change? | Verdict |
|---|---|---|---|---|
|
Option 1
Raise thresholds only |
Increase minimum score per test individually. Small increases to thresholds let most candidates pass; very high ones could filter significantly | Partial Very high thresholds reduce volume but cannot fix it in advance |
Minimal | Insufficient alone Volume unpredictable; at high cut scores, borderline candidates may pass or fail on random factors |
|
Option 2
Raise composite cutoff (individual thresholds unchanged) |
Keep per-test pass marks as they are, but require a higher combined score across verbal + N+A to proceed. Unlike Option 1, a weaker score on one test can be offset by a stronger score on another. This option would raise the bar on the reasoning composite without changing any individual threshold — but cannot fix exact numbers in advance. | Smoother But still cannot cap volume |
Minimal | Not ideal Allows extreme imbalance across tests: a candidate could score very high on verbal and near-zero on N+A and still clear the composite bar; volume remains open-ended |
|
Option 3
Ranking only |
Rank all candidates by score; only top X proceed; no minimum threshold | Precise Exact volume control |
Required | Viable but sensitive Politically sensitive; no minimum bar for competence |
|
Option 4 ★ Potential Compromise
Hybrid: threshold + ranking + cap |
Pass mark required per test individually (each must be cleared on its own) → combined score → ranking → hard cap on advancement | Full control Defined volume from the outset |
Required | Best overall Balances fairness, control, and minimises legal risks |
- Options 1 and 2 cannot fix volume in advance. Clustering near a high cut score also increases the risk of borderline candidates passing or failing on random factors rather than ability.
- Option 3 gives precise volume control but sets no minimum competence bar — which could invite legal challenge and raises the question of whether the process is genuinely selecting for ability or just for relative ranking.
- Option 4 combines a per-test pass mark (competence floor) with composite scoring, ranking, and a hard cap. This gives predictable volume, lower legal risk, and aligns with good assessment practice: a floor ensures baseline ability, ranking handles the rest.
- A note on the threshold in Option 4: at this scale and with a well-calibrated hard cap, the per-test pass mark is unlikely to filter many candidates in practice — the ranking and cap will do the real work. Its value is not primarily operational; it is legal and psychometric. It establishes a documented minimum competence standard, provides a defensible basis for the process design, and ensures the ranking operates above a meaningful baseline rather than simply selecting the top N from all comers regardless of score.
A practical note on scoring weights: some psychometric literature suggests that reasoning tests administered in a language that is not the test-taker's strongest may understate actual ability, a concern sometimes referred to as adverse impact. Under EPSO's current NoC, candidates choose their Language 1 freely — in practice, this is usually their native or most fluent language, so the concern applies primarily to candidates who do not have a strong native-language option among the EU's 24 official languages, or who choose a Language 1 in which they are less comfortable than expected. This is not an established finding specific to EPSO's tests, but it is a consideration worth keeping in mind when setting the relative weight of verbal reasoning in a revised scoring model.
A key parameter to set upfront is how many candidates proceed to Stage 2. Working back from the reserve list target of 1,490 and applying realistic success rates across later stages, inviting roughly 3 to 6 times the reserve list size, so approximately 4,500 to 9,000 candidates, seems like the right order of magnitude. Not tens of thousands.
Keeping that cap in place would also directly reduce the EUFTE volume. The EUFTE (Free-Text Essay on EU Matters) is currently scored by human assessors and is the main bottleneck for getting results out on time, so keeping the population reaching Stage 3 manageable matters a great deal.
Section 05
How a Funnel Model Could Work in Practice
If EPSO were to go down the Path B route — introducing ranking at Stage 1 and a hard cap on advancement — the result would be a narrowing pyramid, with each stage placing fewer and more manageable demands on infrastructure, staff, and assessment capacity. This is what the operational design could look like:
What a Three-Stage Funnel Model Could Look Like
Fairness Requirements (all stages)
IT Risk Mitigation
Reasonable Accommodations
Self-Service
FAQs, guides, setup checks
Basic Support
Login, scheduling, account issues
Technical
IT problems, browser issues, proctoring
EPSO Decisions
Incidents, exceptions, formal complaints
Equal treatment across test days would be an important fairness consideration. For reasoning tests, the standard tool is well-established: calibrated question banks with equivalent test forms (different versions of the test, matched for difficulty, so no session has an easier or harder draw).
IT failure risk goes beyond server capacity and covers several failure modes:
- Login overload, browser incompatibility, and proctoring failures
- Helpdesk saturation if multiple issues occur simultaneously
Running sessions below maximum capacity, staggering logins, and keeping contingency rounds available would all reduce these risks. A ranking approach also lowers IT exposure across the board by sharply cutting the number of candidates at each subsequent stage.
Section 06
Limitations & Political Risks
The analysis above points toward a clear direction. But a workable direction on paper and political viability are not the same thing, and two significant risks are worth looking at honestly: how ranking on reasoning tests is likely to be received politically, and the structural challenge created by a highly uneven national distribution of applicants.
The pre-selection objection
It is worth being precise here, because the current AD5 NoC already contains a significant pre-selection dynamic. Under the current model, numerical and abstract reasoning are threshold-only tests, but verbal reasoning, EU Knowledge, and Digital Skills are combined into a preliminary score. Candidates must first pass all per-test thresholds — then they are ranked in descending order of that combined score, and only those ranked high enough have their EUFTE response evaluated and their eligibility checked. This is not a clean "everyone is assessed equally" model; it is already a model in which a preliminary ranking determines who effectively competes for reserve list places.
What Paths A and B would change, therefore, is not whether pre-selection exists, but how it works and which tests drive the filtering. Path B is the most significant departure: it would shift the ranking weight away from a composite that includes EU Knowledge and Digital Skills, towards a reasoning-only composite at Stage 1 — with EU Knowledge moving to a later stage for a smaller group. That is a genuine change in what drives the selection, and it is this that critics will focus on — not pre-selection itself, which already exists.
The Political Risk Landscape: What Opposition Could Look Like
The national distribution challenge
A separate structural issue adds to the political difficulty. Approximately 45% of registered candidates are Italian nationals, a concentration that, regardless of which selection model is used, creates a difficult downstream dynamic that no design choice can fully neutralise. The challenge runs in both directions simultaneously:
The Nationality Imbalance: A Structural Problem in Any Scenario
This imbalance would pose challenges in every scenario, not just under a ranking-based model. Whatever model EPSO were to adopt, a large national concentration at the top of the score distribution would likely generate complaints from at least one direction. The question is not how to eliminate this risk, but how to make the process transparent and legally robust enough that any outcome can be defended.
Section 07
Wider Consequences
The AD5 Graduate dilemma does not exist in isolation. Whatever resolution EPSO chooses, the knock-on effects on the broader competition calendar and on candidate expectations are real and, in several cases, already unavoidable.
Knock-on Effects: Beyond the AD5 Competition
The AD5 competition may come to be seen as a turning point — a stress test that raises deeper questions about EPSO's capacity, mandate, and relationship with the institutions it serves. How those questions are handled in 2026 will shape the competition landscape well beyond this single exam.
How did this situation arise?
Understanding the contributing factors matters, not to assign blame, but because it affects the political feasibility of any proposed solution:
- Sustained institutional interference in EPSO's design decisions produced a succession of exam redesigns that were difficult to implement cleanly
- Court of Justice rulings added legal constraints that narrowed EPSO's room for manoeuvre
- EPSO's own decision to introduce significant changes to the competition format, changes it could not deliver on schedule, resulted in approximately seven years without a major graduate competition, creating an enormous backlog of frustrated candidates
- When the competition was finally announced, steps were taken to maximise participation at a moment when the accumulated pool of frustrated candidates who had been waiting years for a competition should already have signalled caution about volume
Each factor constrained the options available to the next decision-maker, in a sequence where each step narrowed what remained possible, and which produced 174,000+ registrations for a competition originally planned for around a third of that number. It is also worth noting that any stakeholder who contributed to creating this situation may find it harder to object credibly to the operational consequences of resolving it.
Section 08
Open Questions
The analysis in this article points toward a reasonably clear direction: a ranking-based funnel, based at Stage 1, with a hard cap on how many candidates proceed. But a workable direction on paper and political viability are not the same thing. Given the limitations and risks outlined above, it seems more honest to close with open questions than with a confident prescription.
It is quite possible that there is no solution that is simultaneously workable, politically acceptable, legally safe, and fair to all candidates. But the absence of a perfect option is not a reason for inaction — it is a reason to choose deliberately rather than by default. How EPSO and its Management Board navigate this competition will be watched closely: by candidates, by member states, by the institutions it serves, and by anyone who cares about the long-term credibility of EU recruitment. The decisions made in the coming weeks will shape not just this competition, but EPSO's authority to run future ones.
What do you think? Are there design choices or political paths we haven't considered? Let us know.
A note on timing: At the time of writing (24 March 2026), we consider it next to impossible that the necessary decisions on how to proceed, followed by any required legal amendments to the Notice of Competition and the operational preparedness of the technical provider, can be completed in time to run the competition before September 2026. Our best estimate is that mid-September 2026 is the earliest realistic date. It is therefore vital that EPSO or its Management Board settles on a course of action as quickly as possible, because every week of indecision narrows the options further.
Disclaimer: This analysis is based on EU Training's understanding of the EPSO competition system and the information available to us at the time of writing (24 March 2026). It is not official communication from EPSO or any EU institution, and should not be treated as such. EU Training is a private company, fully independent from the EU institutions. Views expressed are our own and are provided strictly for information purposes.
© 2026 András Baneth · EU Training · Analysis for informational purposes only
