How scoring works.
Every role gets three numbers between 0 and 100. Those numbers come from a deterministic function of the listing and its company. No language model is involved.
src/ file paths cited below point to a repo that isn’t public yet. Open-sourcing the index, scoring, and ingestion pipeline is on the roadmap — we want the API and data shape to settle first so the public repo doesn’t break under everyone the day after we publish.1 · AI relevance
ai_relevance_score answers: how much does this role look like AI work? Same shape as the OSS score below — a hand-curated company prior blended with a text signal read off the posting. Same promise: deterministic, no LLM, no learned weights.
The company prior
Every company in the registry has an ai_focus_score between 0 and 100, set by hand from what the company actually ships, researches, and talks about publicly. Some real values from the seed registry:
- Hugging Face, NVIDIA — 98 / 95 · core AI companies.
- Mistral, Groq, Cerebras — 95 / 92 / 88 · model labs and AI accelerator makers.
- Anyscale, LangChain, LlamaIndex — 88 / 90 / 88 · AI tooling whose whole product is AI-shaped.
- Databricks, Pinecone, Qdrant — 72 / 80 / 78 · data + retrieval, heavily AI-adjacent.
- Cloudflare, Elastic, Fly.io — 62 / 58 / 58 · platforms where AI is real but not the whole product.
- Docker, Grafana, GitLab — 42 / 45 / 48 · AI-touching infra, but mostly classical DevOps work.
A high AI prior doesn’t mean every role at the company counts — it just means even a vaguely-worded JD will inherit some AI signal. A low prior means the description has to do the work.
The text signal
Title, description, employment type, location, and remote field are concatenated and scanned case-insensitively. Two kinds of hits are counted:
- Word patterns matched as whole tokens (regex word boundaries), worth 12 points each:
ai,ml,llm,llms,nlp,gpu,gpus,cuda,mlp. - Phrases matched as substrings, worth 10 points each: machine learning, deep learning, large language, transformer(s), inference, model training, fine-tuning, pytorch, tensorflow, jax, neural network, embedding(s), vector database, retrieval augmented, generative, diffusion, reinforcement learning, rlhf, dataset, feature store, mlops, model serving, computer vision, speech recognition, agents / agentic, evals / eval suite, evaluation harness, rag.
Both lists count distinct hits and the total is capped at 100. The full lists live in _AI_WORD_PATTERNS and _AI_PHRASES in src/wr/services/job_scoring.py.
How they blend
def ai_relevance(job, company): company_ai = company.ai_focus_score # 0..100, hand-curated post_ai = ai_text_signal(job) # 0..100, capped hits w = 0.50 # default company weight if is_technical_role(job): w += 0.08 # engineer / researcher / SRE / etc. return round(w * company_ai + (1 - w) * post_ai)
At the default weight that’s a 50 / 50 split. The technical-role bump pushes it to 58 / 42 — the same philosophy as the OSS score: for an actual engineering hire, the company’s real-world AI posture is a stronger signal than how many keywords ended up in the JD.
2 · Open source
open_source_score answers a single question: how much does this role look like open-source AI work? It blends two ingredients — a hand-curated company prior and a text signal read off the posting itself. No repo crawling, no language model, no learned weights.
The company prior
Every company in the registry has an open_source_score between 0 and 100, set by hand from public evidence: which models and repos they’ve released, what license those repos ship under, how much of their actual stack is in the open. The justification is stored right next to the number in an open_source_evidence field — a maintainer is on the hook when a value looks wrong.
Some real values from the seed registry:
- Hugging Face — 98 · Transformers, datasets hub, Apache-2 across the stack.
- Qdrant — 95 · the vector engine itself is Apache-2.
- Anyscale — 92 · Ray, Apache 2.0.
- Mistral — 72 · open-weight Mistral and Mixtral releases.
- NVIDIA — 68 · CUDA ecosystem; large OSS-adjacent surface, but the core platform isn’t open.
- CoreWeave — 30 · closed proprietary cloud.
The text signal
The posting’s title, description, employment type, location, and remote field are concatenated and scanned case-insensitively for a small, fixed list of OSS phrases. Each distinct phrase that appears anywhere in that blob is worth 18 points; the total is capped at 100. The full list is in src/wr/services/job_scoring.py:
# OSS_PHRASES — the entire list, no surprises "open source", "open-source", "oss", "github", "gitlab", "permissive license", "apache 2", "apache-2", "apache license", "mit license", "mit", "gpl", "gplv", "upstream", "contributor"
One match: 18. Three matches: 54. Six or more: still 100. Mentioning “GitHub” ten times in a description is the same as mentioning it once — what counts is the number of distinct OSS concepts the posting actually names.
How they blend
def open_source(job, company): company_oss = company.open_source_score # 0..100, hand-curated post_oss = oss_text_signal(job) # 0..100, capped phrase hits w = 0.46 # default company weight if is_technical_role(job): w += 0.06 # engineer / researcher / SRE / etc. return round(w * company_oss + (1 - w) * post_oss)
At the default weight that’s a 46 / 54 split leaning slightly toward the posting. A handful of worked examples, with the company prior on the left and a description that names three OSS concepts (post_oss = 54) on the right:
- Hugging Face (98) →
round(0.46·98 + 0.54·54)= 74 - Anyscale (92) → 71
- Mistral (72) → 62
- CoreWeave (30) → 43
- A 30-rated company with no OSS phrases at all (
post_oss= 0) → 14.
The technical-role bump (+0.06) tilts the formula a little further toward the company prior for engineer, researcher, and infra titles — the assumption being that for a real engineering hire, the company’s actual OSS posture matters more than however many keywords made it into the JD.
Strongly excluded roles (recruiter, HR, payroll, account executive, corporate counsel, …) score zero on every axis, no matter what the description says or how high the company’s prior is. The full exclusion list is BOARD_STRONG_EXCLUSION_FRAGMENTS in job_scoring.py.
3 · Overall
The overall score is a weighted blend of the two above plus a small bonus for seniority and infrastructure signals. Coefficients sum to 1, so the result is bounded by 0 and 100 by construction:
def overall(job): seniority_pts = 6 * count_seniority_hits(title) # cap at 18 infra_pts = 6 * count_infra_hits(blob) # cap at 18 signal = seniority_pts + infra_pts # 0..36 return round( 0.52 * job.ai_relevance_score + 0.33 * job.open_source_score + 0.15 * signal )
The two phrase lists driving signal:
- Seniority (title only, 6 pts each, cap 18): staff, principal, senior, lead, director, head of, distinguished.
- Infrastructure (full blob, 6 pts each, cap 18): kubernetes / k8s, devops, site reliability / sre, platform engineer, infrastructure, cloud engineer, distributed systems, gpu, container(s), orchestration, runtime, docker, terraform, nomad.
Worked example for a “Senior Infrastructure Engineer (Kubernetes, GPU)” with ai_relevance_score = 85 and open_source_score = 70:
- Seniority hits: senior → 6 pts.
- Infra hits: infrastructure, kubernetes, gpu → 18 pts (capped).
signal = 24.round(0.52·85 + 0.33·70 + 0.15·24) = round(44.2 + 23.1 + 3.6)= 71.
The bonus contributes at most 0.15 · 36 ≈ 5.4 points — enough to break ties between two otherwise-similar roles, not enough to drag a poorly-matched listing onto the top of the board.
4 · Eligibility
Not every role at every company makes the board. jobs.board_eligible is a precomputed boolean recomputed on every ingest, and a job has to pass all four of these gates to flip it to true:
- Company is publicly visible. Active, not
REJECTED, in-scope (the company’sai_focus_scoremeetsPUBLIC_BOARD_MIN_AI_FOCUS, default 35), and either verified or likely with effective confidence ≥PUBLIC_BOARD_MIN_CONFIDENCE(default 72). Anything else sits in the admin review queue. - Job is open. Postings are flipped to
CLOSEDwhen they disappear from a successful ATS fetch. Network errors don’t close listings — only an actual successful re-pull that no longer contains them. - The posting clears the AI bar. Either
ai_relevance_score ≥ JOB_BOARD_MIN_AI_RELEVANCE(default 18), or the company’sai_focus_score ≥ JOB_BOARD_COMPANY_AI_PRIOR_MIN(default 75) and the title looks technical (engineer, researcher, SDK, DevRel, …). The second branch lets a vaguely-worded JD at, say, NVIDIA still surface, but only for engineering-shaped titles. - Title and body don’t match strong exclusions. Recruiter, HR, payroll, account executive, corporate counsel, office manager, and the rest of
BOARD_STRONG_EXCLUSION_FRAGMENTSare zeroed out unconditionally — no bypass via company prior, no rescue from a high text score.
Freshness (the “seen in the last N days” rule) is not a board gate — it’s an opt-in query parameter, ?max_stale_days=N on GET /jobs. The default board view shows every eligible posting we’ve confirmed is still open at its source ATS.