Case Study

Research Radar: explainable discovery for MIR and audio ML papers

Next.jsFastAPIPostgres + pgvectorRanking prototype

Research Radar is a working prototype. The goal is straightforward: ranked lists of MIR and audio ML papers where you can see why each result landed where it did. You can move from the main feed into a paper, check the evaluation against simpler sorts, and follow trends in the current dataset without losing the reasoning behind the order. The live app runs separately from this case study; the bridge feature is visible as a separate experimental view and is kept out of the main recommendation flow for now.

View source See walkthrough

What the project does

Research Radar centers on one workflow: inspect ranked recommendations, open a paper dossier, compare against baselines, and read trends in corpus context. The core value is making the ranking signals visible.

Ranked recommendations

Two main feeds: emerging papers and undercited papers. Each list is precomputed and every card includes enough context to show why that paper ranked where it did.

Paper detail with similar papers

Each paper page shows metadata plus a related-papers section, so someone can move from one useful paper to the next without starting over from search every time.

Trends in the current dataset

The trends page shows which topics are gaining traction inside the papers the ranker is working with. Useful context for recurring topics in the feeds.

Evaluation and comparison

The evaluation page compares the ranking against simpler baselines like citation count and recency so you can sanity-check behavior against scores everyone already understands.

Engineering choices

Recommendations should answer "why is this here?"

Each ranked list stores the signal mix that produced it alongside the results. When the ranking code changes and something moves, you can check what actually shifted instead of guessing. It also gives anyone reviewing the work something concrete to push back on.

Splitting the site from the prototype is intentional

This page is the case study, while the prototype runs as its own app. Under the hood there is a Next.js frontend, a FastAPI backend, Postgres with pgvector for storage and similarity search, and Python jobs for ingest, ranking, and clustering. Keeping those pieces separate makes it easier to update the ranking workflow without turning every change into a full-site deploy.

Experimental ideas stay visible and clearly labelled

Bridge is the main example: the signal is measured and shown in the UI, but it is kept out of the final score until the feature is strong enough to earn that role. I'd rather show the work in progress than hide it.

Scope and evaluation boundaries

Current corpus scope is intentionally narrow. The deployed corpus is curated around the currently wired bootstrap sources, and it should not be read as a comprehensive index of audio-ML literature.

Evaluation here is proxy-only: citation/date baselines, topic-mix and recency checks, plus distribution-level comparisons. There is not yet a human-labeled relevance benchmark.

Visual walkthrough

Most paper tools give you a sorted list and no explanation of why anything landed where it did. I built this partly to fix that and partly to understand what it actually takes to build something like this end to end. The way I use it: open the emerging or undercited feed, read the signal note on each card before trusting the order, click into a paper if it looks worth following, then check the evaluation page to see whether the ranking makes sense against simpler sorts like citation count or date.

Research Radar recommended emerging page showing ranked paper cards with visible signal breakdowns — Step 1: The ranked feed
/recommended?family=emerging
This is the main list. Each card shows a paper and a short breakdown of what pushed it up in the ranking for this snapshot. You don't have to take the order on faith.
What to look at
Pick any card and read the signal block below the title. That's the system's reasoning for that result.
The feed is either emerging (newer papers gaining traction in this set) or undercited (papers that look underappreciated relative to their signals).
What you can take from this: The signal mix is visible enough to explain each result and compare runs when the code changes.
Boundary: The corpus is curated and narrower than the field. Objective best-paper claims are out of scope.

Research Radar paper detail page for GiantMIDI Piano showing metadata and similar papers — Step 2: Paper detail and similar papers
/papers/https%3A%2F%2Fopenalex.org%2FW3093121331
Click any card from the feed and you land here. You get the paper's full details, where it sat in the ranked list, and a set of similar papers so you can keep exploring without starting over.
What to look at
Venue, year, citation count, and topic tags are all in one place.
The similar papers section finds nearby results by content. Useful when one paper looks relevant and you want to see what else is around it.
What you can take from this: You can move from a ranked result to a full paper view and keep moving through related work without losing your place.
Boundary: Similar papers are matched by content similarity. Treat them as navigation aids, not quality judgments.

Research Radar evaluation page comparing ranked output against citation and date baselines — Step 3: Evaluation against simpler sorts
/evaluation?family=emerging
This is where I check whether the ranking is actually doing something useful. It puts the system's output next to the most obvious alternatives: sort by citation count, sort by date. If the ranking is adding value, you should be able to see it here.
What to look at
Compare which papers appear in the ranked list versus what you'd get from a simple sort by citations or recency.
Look at the overall shape, not just individual rows. Does the ranked list feel meaningfully different from the simple sorts?
What you can take from this: A quick sanity check that the ranking is doing something beyond what a spreadsheet sort would give you.
Boundary: This compares against simple baselines. It is a useful starting point before a formal relevance benchmark exists.

Research Radar trends page showing topic momentum inside the curated corpus — Step 4: Trends in this dataset
/trends
The trends page shows which topics are picking up momentum inside the papers the ranker actually sees.
What to look at
Topic clusters gaining momentum within this snapshot. Useful for understanding why certain papers are surfacing in the emerging feed.
What you can take from this: Gives context for why a topic might be heating up in the ranked lists right now.
Boundary: This reflects momentum inside the curated set only. It won't match broader field trends.

Optional: Bridge experiment

Route: /recommended?family=bridge

Bridge signal is measured and shown.
In the referenced stable run, bridge weight is zero, so final_score does not use the bridge signal.
Bridge is still experimental and separate from the main recommender.

Continue to live prototype

Public project notes

What you can verify without special access: repository history, linked tests, roadmap notes, and the screenshot baseline shown in the walkthrough above. Internal hosting details are intentionally omitted; the live prototype uses the stable radar.mmaitland.dev subdomain.

Stable now

The two main feeds (emerging and undercited) are working and show you why each paper ranked where it did. Bridge is visible as a separate experimental view.
Paper detail, trends, and evaluation are all working parts of the prototype.
The most useful thing it does right now: you can see the ranking logic, compare runs, and check it against simpler sorts.

What is still experimental

In the referenced stable run, bridge weight is zero, so final_score does not use the bridge signal.
I tested weighted variations; with the current signals the deck order barely moved, so the next lever is the bridge features themselves.
The next step is likely better bridge signals and tighter filtering before touching the scalar weights again.

Boundaries

General semantic relevance is not treated as a default quality score. Some pinned emerging runs use embedding fit as one ranking feature, and the live UI labels when that feature is used.
Bridge is currently presented as an experiment, separate from the main recommendation path.
The corpus is still curated and narrower than the long-term vision.

Tech stack

The interface is built in Next.js, the API is built in FastAPI, and the data lives in Postgres with pgvector. A separate Python pipeline handles ingest, cleanup, embeddings, ranking, and clustering experiments behind the scenes.

Next.jsTypeScriptFastAPIPythonPostgrespgvectorOpenAlexRanking runsClustering

Lessons learned

Building the explanation layer early turned out to matter more than I expected. Once each run stored why it ranked the way it did, comparing versions became straightforward instead of guesswork. Keeping bridge visible as a separate experimental view also helped: the core flow stayed focused, and I could work on the experimental side without it polluting the main results. The evaluation page earned its place too. I kept expecting a single summary score to be enough, and it never was. Seeing the full comparison grid against simple sorts was always more honest.

Supporting links

Research Radar source repo Technical brief and evaluation notes Ranked recommendation tests Evaluation compare tests

Where it stands

Status: Live prototype

The strongest stable claim today is that the prototype makes its ranking behavior visible and understandable over a curated set of MIR and audio ML papers.

Known limits: Bridge is an experimental view separate from the main recommender; semantic similarity only appears in runs where the UI labels it; the corpus is still narrower than the long-term plan.

Research Radar: explainable discovery for MIR and audio ML papers

Next.jsFastAPIPostgres + pgvectorRanking prototype

Scope and evaluation boundaries

Current corpus scope is intentionally narrow. The deployed corpus is curated around the currently wired bootstrap sources, and it should not be read as a comprehensive index of audio-ML literature.

Evaluation here is proxy-only: citation/date baselines, topic-mix and recency checks, plus distribution-level comparisons. There is not yet a human-labeled relevance benchmark.

Tech stack

Next.jsTypeScriptFastAPIPythonPostgrespgvectorOpenAlexRanking runsClustering

Lessons learned

What the project does

Ranked recommendations

Paper detail with similar papers

Trends in the current dataset

Evaluation and comparison

Engineering choices

Recommendations should answer "why is this here?"

Splitting the site from the prototype is intentional

Experimental ideas stay visible and clearly labelled

Scope and evaluation boundaries

Visual walkthrough

Step 1: The ranked feed

Step 2: Paper detail and similar papers

Step 3: Evaluation against simpler sorts

Step 4: Trends in this dataset

Optional: Bridge experiment

Public project notes

Stable now

What is still experimental

Boundaries

Tech stack

Lessons learned

Supporting links

Where it stands

What the project does

Ranked recommendations

Paper detail with similar papers

Trends in the current dataset

Evaluation and comparison

Engineering choices

Recommendations should answer "why is this here?"

Splitting the site from the prototype is intentional

Experimental ideas stay visible and clearly labelled

Scope and evaluation boundaries

Visual walkthrough

Step 1: The ranked feed

Step 2: Paper detail and similar papers

Step 3: Evaluation against simpler sorts

Step 4: Trends in this dataset

Optional: Bridge experiment

Public project notes

Stable now

What is still experimental

Boundaries

Tech stack

Lessons learned

Supporting links

Where it stands

Leave a question or comment

Leave a question or comment