Sieve

Automated literature monitoring for researchers.

Sieve fetches new papers daily from bioRxiv, arXiv, and journal RSS feeds, scores them against your research interests using Claude, and presents the results as a local web interface. You can also walk the citation graph of any paper to catch up on related literature.

How it works

Fetch — pulls new papers from configured sources (bioRxiv, arXiv, RSS feeds)
Score — runs a two-stage Claude pipeline (Haiku triage → Sonnet scoring) against your interests.md profile
Ingest — stores papers and scores in a local SQLite database
Serve — generates a static site you can browse and annotate locally

Papers are scored 1–10. You configure thresholds for what gets stored and what appears in the UI.

The scorer is driven by a natural-language interests.md — you explain why you care about certain work, not a keyword list. As you use Sieve, sieve learn turns that usage back into the profile: the papers you add to your reading list ("more like this") and the ones you flag as "less like this" become proposed edits to interests.md, so the profile sharpens as your tastes become clear.

Requirements

uv — Python package manager
Claude Code CLI — must be installed and authenticated (claude -p must work)

Setup

1. Clone and install

git clone https://github.com/yourname/sieve.git
cd sieve
uv sync

2. Configure

cp config/settings.yaml.example config/settings.yaml
cp config/interests.md.example config/interests.md

Edit config/settings.yaml — set your sources, thresholds, and email:

lookback_days: 2

store_threshold: 5
display_threshold: 7
site_threshold: 4

batch_size: 30          # papers per Haiku triage batch
sonnet_batch_size: 40   # papers per Sonnet scoring batch

biorxiv_category: neuroscience

arxiv_categories:
  - q-bio.NC

mailto: "you@example.com"  # used for OpenAlex polite pool

max_papers_per_source: 500

feeds:
  - name: "Nature Neuroscience"
    url: "https://www.nature.com/neuro.rss"
  - name: "Nature"
    url: "https://www.nature.com/nature.rss"

Edit config/interests.md — describe what you want to read. This is the prompt fed to Claude for scoring. Be specific: name topics, methods, labs, and things you explicitly don't want. The quality of scoring depends directly on this file.

Example interests.md

## Research Interests

### Core topics

- Visual cortex circuits in mice — response properties, layer and cell-type specificity, feedforward and feedback pathways
- How interneuron subtypes (PV, SST, VIP) shape cortical gain, selectivity, and network dynamics
- Population coding and dimensionality in visual cortex during perception and behavior
- Top-down modulation of sensory cortex — attention, locomotion, arousal state effects on V1/HVA responses

### Methods I follow

- Two-photon calcium imaging and large-scale Neuropixels recordings in behaving mice
- Optogenetic dissection of specific cell types or pathways during visual tasks
- Computational models of visual cortical circuits

### Explicitly NOT interested in

- Human or primate visual neuroscience unless methods are directly transferable to mice
- MRI or EEG-based studies
- Clinical or disease contexts
- Pure psychophysics without a neural circuit component

3. Run

sieve run       # fetch → score → ingest → generate site
sieve serve     # open the site in your browser

Daily use

Command	Description
`sieve run [--site-threshold N]`	Fetch new papers, score, and update the site
`sieve serve`	Start local server and open browser
`sieve seed --doi <DOI> [--pdf PATH]`	Bootstrap your interests from a single paper (cold start)
`sieve learn [--recent K] [--older-sample M] [--all]`	Tune `interests.md` from your reading list + "less like this" flags
`sieve cite --doi <DOI> [--forward] [--recommend] [--site-threshold N]`	Score the citation graph of a paper
`sieve clean`	Prune low-score papers outside the fetch window
`sieve export --from FILE [--output PATH] [--title TEXT] [--interests PATH]`	Generate a standalone annotated bibliography HTML

--site-threshold N overrides the site_threshold from settings for that run (useful for one-off exploration without changing your config).

Citation graph (`sieve cite`)

Use this to catch up on a paper's references — useful when you encounter a key paper and want to know which of its citations you should read.

# References of a paper (backward citations)
sieve cite --doi 10.1101/2022.09.29.510081

# Also fetch papers that cite this paper (forward citations)
sieve cite --doi 10.1101/2022.09.29.510081 --forward

# Include S2-computed related papers
sieve cite --doi 10.1101/2022.09.29.510081 --recommend

Accepts DOIs, bioRxiv DOIs, Semantic Scholar paper IDs, Corpus IDs, or full Semantic Scholar URLs:

sieve cite --doi 252528267
sieve cite --doi "https://www.semanticscholar.org/paper/Title/f583cb7b6e6aa669..."

Note: Many journal papers are indexed in Semantic Scholar under their preprint DOI. If a journal DOI fails, try the bioRxiv DOI. Sieve falls back to OpenAlex automatically when Semantic Scholar blocks reference access (common for Elsevier papers).

Tuning your interests over time

interests.md is the heart of the system, and it's meant to evolve. There are two ways to refine it — one for cold start, one for steady state.

Cold start: `sieve seed`

Before you have any reading history, bootstrap the profile one paper at a time. When you read a paper that makes you realise your profile is missing something:

sieve seed --doi 10.1038/s41593-022-01107-4

Claude evaluates whether the paper represents a gap in your interests.md and suggests an addition. You confirm before anything is written. (--pdf PATH supplies a local PDF when the DOI fetch doesn't yield a usable abstract.)

Steady state: `sieve learn`

Once you've been using Sieve, your behaviour is the signal — no need to feed papers in by hand. Two actions in the web UI accumulate in the database:

Reading list — papers you save are treated as "more like this".
Less like this — the button on each paper records an explicit negative example (a false positive you want fewer of). These persist even after the paper is pruned.

Then run:

sieve learn

Claude reviews those saved and rejected papers against your current profile and proposes edits in three forms, which you review and confirm per group:

Add — new interest lines (liberal, to avoid missing relevant work)
Revise — rewrite a vague or overly broad line to be sharper, shown as a before→after diff
Remove — delete a stale or redundant line (conservative — it prefers a revision when in doubt)

Before any revision or removal, interests.md is backed up to data/backups/ and the restore command is printed, so edits are always reversible.

To keep the prompt focused on current interests, learn samples your reading list: it always includes the most recent --recent K saves (default 50) plus a random --older-sample M (default 25) drawn from older ones, so long-standing themes still surface without overweighting whatever was published lately. Use --all to consider the entire reading list (e.g. a periodic full rebuild). Explicit "less like this" flags are always included, newest first.

Annotated bibliography (`sieve export`)

Generate a standalone HTML bibliography from a list of DOIs — useful for sharing a curated reading list or preparing a literature review.

# From a plain-text file (one DOI per line, # comments ignored)
sieve export --from dois.txt

# From a BibTeX file
sieve export --from refs.bib --output report/bibliography.html --title "My Reading List"

# Re-annotate with a custom interests profile (re-scores via Sonnet; does not modify DB)
sieve export --from refs.bib --interests config/interests_project.md

Papers must already be in your local database (run sieve cite or sieve run first to ingest them). DOIs missing from the DB are reported but don't abort the export.

For BibTeX input, sieve export requires every entry to have a doi field. If any are missing it will prompt before continuing — pass --ignore-errors to skip the prompt and proceed anyway.

Output defaults to site/bibliography.html.

Scoring thresholds

Configured in settings.yaml:

Scoring uses a two-stage Claude pipeline: Haiku for fast triage, then Sonnet for final scores on papers that pass triage.

Setting	Default	Meaning
`store_threshold`	5	Minimum score to save to DB
`display_threshold`	7	Minimum score shown highlighted in UI
`site_threshold`	4	Minimum score shown in site at all
`lookback_days`	2	Days of papers shown in site
`batch_size`	30	Papers per Haiku triage batch
`sonnet_batch_size`	40	Papers per Sonnet scoring batch
`max_papers_per_source`	500	Max papers fetched per source per run

Scheduled runs (macOS)

setup_launchd.sh installs a launchd job that runs sieve run daily:

bash setup_launchd.sh                      # runs at 6:00 AM daily
bash setup_launchd.sh --hour 8 --minute 30
launchctl start com.sieve.run              # trigger immediately to test
bash setup_launchd.sh --uninstall

Logs are written to data/logs/launchd.log.

Optional: Semantic Scholar API key

The public S2 API works without a key at low request rates. If you hit rate limits, get a free key at semanticscholar.org/product/api and set:

export S2_API_KEY=your_key_here

License

MIT — see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
config		config
src/sieve		src/sieve
tests		tests
.copier-answers.yml		.copier-answers.yml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
BIBLIOGRAPHY_IDEA.md		BIBLIOGRAPHY_IDEA.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
paper-agent-spec.md		paper-agent-spec.md
pyproject.toml		pyproject.toml
setup_launchd.sh		setup_launchd.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sieve

How it works

Requirements

Setup

1. Clone and install

2. Configure

3. Run

Daily use

Citation graph (`sieve cite`)

Tuning your interests over time

Cold start: `sieve seed`

Steady state: `sieve learn`

Annotated bibliography (`sieve export`)

Scoring thresholds

Scheduled runs (macOS)

Optional: Semantic Scholar API key

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sieve

How it works

Requirements

Setup

1. Clone and install

2. Configure

3. Run

Daily use

Citation graph (sieve cite)

Tuning your interests over time

Cold start: sieve seed

Steady state: sieve learn

Annotated bibliography (sieve export)

Scoring thresholds

Scheduled runs (macOS)

Optional: Semantic Scholar API key

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Citation graph (`sieve cite`)

Cold start: `sieve seed`

Steady state: `sieve learn`

Annotated bibliography (`sieve export`)

Packages