Research Scientist interview prep.
A research scientist at a frontier AI lab or platform firm produces ideas + experiments + papers + safety analysis, the work that shapes what models become and how they're deployed.
What interviewers look for
- Can the candidate defend their own research end-to-end - design, methodology, results, limitations, what they'd do differently - under 30-60 min of probing?
- Do they read + critique recent papers with technical specificity (what's the contribution, what's missing, what experiments would strengthen) - not just summarise the abstract?
- Can they design a clean experiment to test a hypothesis - controls, ablations, evaluation choices, statistical rigor, honest about confounds?
- Do they reason about alignment + safety honestly - know the current literature, identify eval gaps, surface failure modes - not platitudes or dismissal?
- Are they intellectually humble + rigorous - state limitations clearly, engage with critique, don't overclaim, debate the substance?
- Are they research-community-aware - the paper they're proudest of, who they cite, what conferences they engage with, recent papers they've found important?
Behavioural questions to expect
Walk me through your CV.
What it tests: Story coherence + genuine fit for the research scientist seat. Teams want evidence of research thinking (paper output, deep technical work) and intellectual curiosity / humility - not pure eng IC moving without research depth.
Tell me about the paper or project you're proudest of.
What it tests: Depth + ownership + scientific rigor. Tests whether the candidate frames the research as question → hypothesis → method → results → limitations → contribution to field, not just 'we got SOTA on X'.
Tell me about a weakness, a failure, or feedback you've received and worked on.
What it tests: Self-awareness + scientific honesty. Cross-role canonical. Fake weaknesses downgrade immediately. Research mistakes (over-claimed a result, missed a confound, missed a citation, missed a safety concern) shape your research reputation.
Why research scientist - and why this lab vs academia or eng IC?
What it tests: Authentic fit for the lab-research seat: idea-generation + experiment-design + paper-writing + community-engagement, with industry-scale compute + collaborators + alignment-with-deployment that academia + eng IC can't match.
Which research area would you want to focus on, and why?
What it tests: Genuine fit + grasp of how research areas differ (capabilities training / alignment / interpretability / RL / multimodal / evals / agents). Tests whether the candidate has a reasoned preference grounded in their thread.
Why this firm?
What it tests: Whether the candidate has done the homework. Bar: firm-specific evidence from the published papers, research culture, safety posture, recent work, and researchers - not generic 'great AI lab'.
How would you describe this firm's research program + safety posture in your own words?
What it tests: Whether the candidate has internalized the firm's research priorities + safety posture - not just that it 'does AI'. Tests whether they've read the recent papers + safety publications.
How does research actually create value at a frontier AI firm?
What it tests: Whether the candidate understands frontier-lab research economics: capability research drives competitive moat + product capability; safety + alignment research is brand + regulatory + deployment-enabling; community engagement + papers + open-source build talent pipeline + brand.
Technical concepts to master
ML fundamentals - transformer + scaling + generalization
- Transformer + attention
- Architecture from Vaswani et al. 2017: self-attention (each token attends to all others), positional encoding, multi-head, feed-forward; the foundation of modern LLMs.
- Scaling laws + compute-optimal
- Empirical: loss ~ (compute, data, parameters)^-α. Kaplan 2020 established initial scaling; Chinchilla 2022 corrected (more data for given compute).
- Generalization + emergent capability
- Model performance on held-out / OOD data; in LLMs, certain capabilities (in-context learning, chain-of-thought) emerge at scale rather than scaling smoothly.
- Optimization + training stability
- Adam / AdamW + learning-rate scheduling + gradient clipping + mixed precision; training-stability tricks (warmup, qk-norm) that make large-model training feasible.
Experiment + evaluation design
- Hypothesis → measurable prediction
- Operationalise a research hypothesis into a measurable prediction; specify what positive / null / negative result would mean before running.
- Controls + ablations
- Comparison condition that isolates the mechanism (control); systematic removal / variation of components (ablations) to attribute the effect.
- Confounds + bias
- Variables that could make the result look real but not be: selection bias in eval data, evaluation-set contamination, proxy-metric gaming, seed variance.
- Eval design - capability + safety
- Capability evals (MMLU, GPQA, HumanEval, MATH) measure what the model can do; safety evals measure what it should not (refusals, jailbreaks, sycophancy, alignment).
Alignment + safety + interpretability
- Alignment training (RLHF + DPO + Constitutional AI)
- RLHF: human preference data → reward model → RL fine-tune. DPO: direct preference optimisation, no reward model. Constitutional AI: self-critique vs constitution, reduces human-feedback bottleneck.
- Alignment failure modes
- Sycophancy (agreeing when wrong), specification gaming (optimising the proxy not the goal), reward hacking, capability without safety (gain capability faster than safety), deceptive alignment (hypothetical).
- Mechanistic interpretability
- Reverse-engineering what individual neurons / circuits do mechanically; aims for human-understandable explanations of model behavior.
- Responsible scaling + frontier eval
- Frontier-lab policies committing to evaluate capabilities + safety before scaling further; pre-deployment evals + escalating safety + security measures.
Research process + paper-writing + community
- Paper structure + writing
- Abstract / introduction / related work / method / experiments + results / discussion / limitations / conclusion. Strong writing makes contributions land; weak writing buries them.
- Peer review + venues
- Major ML venues: NeurIPS, ICML, ICLR, ACL, EMNLP, FAccT, AAAI. Review process: 3-4 reviewers + meta-reviewer; acceptance ~20-30% at top venues.
- Reproducibility + open-source
- Strong papers publish code + data + checkpoints; reproducibility statement; open-source releases that the community can build on.
- Research collaboration + advisorship
- Most research is collaborative; senior researchers advise junior researchers + maintain external academic relationships.
Practical drills
- Walk me through your strongest research paper or project end-to-end - I'll probe deeply on method, results, ablations, limitations.
- Pick a paper from the last 6 months that's relevant to this firm's research areas. Tell me about it + critique it.
- Design an experiment to test whether 'Constitutional AI training reduces sycophancy in a way that's not captured by standard helpfulness evals'.
Smart-question anchors
- Research areas + investments - the firm's research priorities, recent papers, future directions
- Safety + responsible scaling - the firm's safety posture, frontier evals, release policy
- Researcher autonomy - directed vs bottom-up research, publication freedom, conference attendance
- Compute + infrastructure - resources, scale, partnership with eng / infra
- Collaboration + community - academic collaborations, internal cross-team work, publication openness
Related roles
Sourced from
- Interview Query. Anthropic Research Scientist Interview Guide
- Interview Query. OpenAI Research Scientist Interview Guide
- Interview Query. Meta Research Scientist Interview Guide
- Scaling laws + transformer architecture literature (canonical ML)
- Alignment + safety literature (AI safety community)
- ML research process + paper-writing community (NeurIPS / ICML reviewer guidelines)
Ready to Generate Your Own Prep?
Drop your CV and a job description on the home page. A couple of minutes later you get a report with everything you need to land the job.