Automated literature monitoring for researchers.
Sieve fetches new papers daily from bioRxiv, arXiv, and journal RSS feeds, scores them against your research interests using Claude, and presents the results as a local web interface. You can also walk the citation graph of any paper to catch up on related literature.
- Fetch — pulls new papers from configured sources (bioRxiv, arXiv, RSS feeds)
- Score — runs a two-stage Claude pipeline (Haiku triage → Sonnet scoring) against your
interests.mdprofile - Ingest — stores papers and scores in a local SQLite database
- Serve — generates a static site you can browse and annotate locally
Papers are scored 1–10. You configure thresholds for what gets stored and what appears in the UI.
The scorer is driven by a natural-language interests.md — you explain why you care about certain work, not a keyword list. As you use Sieve, sieve learn turns that usage back into the profile: the papers you add to your reading list ("more like this") and the ones you flag as "less like this" become proposed edits to interests.md, so the profile sharpens as your tastes become clear.
- uv — Python package manager
- Claude Code CLI — must be installed and authenticated (
claude -pmust work)
git clone https://github.com/yourname/sieve.git
cd sieve
uv synccp config/settings.yaml.example config/settings.yaml
cp config/interests.md.example config/interests.mdEdit config/settings.yaml — set your sources, thresholds, and email:
lookback_days: 2
store_threshold: 5
display_threshold: 7
site_threshold: 4
batch_size: 30 # papers per Haiku triage batch
sonnet_batch_size: 40 # papers per Sonnet scoring batch
biorxiv_category: neuroscience
arxiv_categories:
- q-bio.NC
mailto: "you@example.com" # used for OpenAlex polite pool
max_papers_per_source: 500
feeds:
- name: "Nature Neuroscience"
url: "https://www.nature.com/neuro.rss"
- name: "Nature"
url: "https://www.nature.com/nature.rss"Edit config/interests.md — describe what you want to read. This is the prompt fed to Claude for scoring. Be specific: name topics, methods, labs, and things you explicitly don't want. The quality of scoring depends directly on this file.
Example interests.md
## Research Interests
### Core topics
- Visual cortex circuits in mice — response properties, layer and cell-type specificity, feedforward and feedback pathways
- How interneuron subtypes (PV, SST, VIP) shape cortical gain, selectivity, and network dynamics
- Population coding and dimensionality in visual cortex during perception and behavior
- Top-down modulation of sensory cortex — attention, locomotion, arousal state effects on V1/HVA responses
### Methods I follow
- Two-photon calcium imaging and large-scale Neuropixels recordings in behaving mice
- Optogenetic dissection of specific cell types or pathways during visual tasks
- Computational models of visual cortical circuits
### Explicitly NOT interested in
- Human or primate visual neuroscience unless methods are directly transferable to mice
- MRI or EEG-based studies
- Clinical or disease contexts
- Pure psychophysics without a neural circuit componentsieve run # fetch → score → ingest → generate site
sieve serve # open the site in your browser| Command | Description |
|---|---|
sieve run [--site-threshold N] |
Fetch new papers, score, and update the site |
sieve serve |
Start local server and open browser |
sieve seed --doi <DOI> [--pdf PATH] |
Bootstrap your interests from a single paper (cold start) |
sieve learn [--recent K] [--older-sample M] [--all] |
Tune interests.md from your reading list + "less like this" flags |
sieve cite --doi <DOI> [--forward] [--recommend] [--site-threshold N] |
Score the citation graph of a paper |
sieve clean |
Prune low-score papers outside the fetch window |
sieve export --from FILE [--output PATH] [--title TEXT] [--interests PATH] |
Generate a standalone annotated bibliography HTML |
--site-threshold N overrides the site_threshold from settings for that run (useful for one-off exploration without changing your config).
Use this to catch up on a paper's references — useful when you encounter a key paper and want to know which of its citations you should read.
# References of a paper (backward citations)
sieve cite --doi 10.1101/2022.09.29.510081
# Also fetch papers that cite this paper (forward citations)
sieve cite --doi 10.1101/2022.09.29.510081 --forward
# Include S2-computed related papers
sieve cite --doi 10.1101/2022.09.29.510081 --recommendAccepts DOIs, bioRxiv DOIs, Semantic Scholar paper IDs, Corpus IDs, or full Semantic Scholar URLs:
sieve cite --doi 252528267
sieve cite --doi "https://www.semanticscholar.org/paper/Title/f583cb7b6e6aa669..."Note: Many journal papers are indexed in Semantic Scholar under their preprint DOI. If a journal DOI fails, try the bioRxiv DOI. Sieve falls back to OpenAlex automatically when Semantic Scholar blocks reference access (common for Elsevier papers).
interests.md is the heart of the system, and it's meant to evolve. There are two ways to refine it — one for cold start, one for steady state.
Before you have any reading history, bootstrap the profile one paper at a time. When you read a paper that makes you realise your profile is missing something:
sieve seed --doi 10.1038/s41593-022-01107-4Claude evaluates whether the paper represents a gap in your interests.md and suggests an addition. You confirm before anything is written. (--pdf PATH supplies a local PDF when the DOI fetch doesn't yield a usable abstract.)
Once you've been using Sieve, your behaviour is the signal — no need to feed papers in by hand. Two actions in the web UI accumulate in the database:
- Reading list — papers you save are treated as "more like this".
- Less like this — the button on each paper records an explicit negative example (a false positive you want fewer of). These persist even after the paper is pruned.
Then run:
sieve learnClaude reviews those saved and rejected papers against your current profile and proposes edits in three forms, which you review and confirm per group:
- Add — new interest lines (liberal, to avoid missing relevant work)
- Revise — rewrite a vague or overly broad line to be sharper, shown as a before→after diff
- Remove — delete a stale or redundant line (conservative — it prefers a revision when in doubt)
Before any revision or removal, interests.md is backed up to data/backups/ and the restore command is printed, so edits are always reversible.
To keep the prompt focused on current interests, learn samples your reading list: it always includes the most recent --recent K saves (default 50) plus a random --older-sample M (default 25) drawn from older ones, so long-standing themes still surface without overweighting whatever was published lately. Use --all to consider the entire reading list (e.g. a periodic full rebuild). Explicit "less like this" flags are always included, newest first.
Generate a standalone HTML bibliography from a list of DOIs — useful for sharing a curated reading list or preparing a literature review.
# From a plain-text file (one DOI per line, # comments ignored)
sieve export --from dois.txt
# From a BibTeX file
sieve export --from refs.bib --output report/bibliography.html --title "My Reading List"
# Re-annotate with a custom interests profile (re-scores via Sonnet; does not modify DB)
sieve export --from refs.bib --interests config/interests_project.mdPapers must already be in your local database (run sieve cite or sieve run first to ingest them). DOIs missing from the DB are reported but don't abort the export.
For BibTeX input, sieve export requires every entry to have a doi field. If any are missing it will prompt before continuing — pass --ignore-errors to skip the prompt and proceed anyway.
Output defaults to site/bibliography.html.
Configured in settings.yaml:
Scoring uses a two-stage Claude pipeline: Haiku for fast triage, then Sonnet for final scores on papers that pass triage.
| Setting | Default | Meaning |
|---|---|---|
store_threshold |
5 | Minimum score to save to DB |
display_threshold |
7 | Minimum score shown highlighted in UI |
site_threshold |
4 | Minimum score shown in site at all |
lookback_days |
2 | Days of papers shown in site |
batch_size |
30 | Papers per Haiku triage batch |
sonnet_batch_size |
40 | Papers per Sonnet scoring batch |
max_papers_per_source |
500 | Max papers fetched per source per run |
setup_launchd.sh installs a launchd job that runs sieve run daily:
bash setup_launchd.sh # runs at 6:00 AM daily
bash setup_launchd.sh --hour 8 --minute 30
launchctl start com.sieve.run # trigger immediately to test
bash setup_launchd.sh --uninstallLogs are written to data/logs/launchd.log.
The public S2 API works without a key at low request rates. If you hit rate limits, get a free key at semanticscholar.org/product/api and set:
export S2_API_KEY=your_key_hereMIT — see LICENSE file for details.