Self-hosted, reproducible bibliometric data preparation for Web of Science & Scopus.
π bibexpy.com Β·
π Documentation Β·
π Paper (SoftwareX) Β·
BibexPy v2 (βHeliumβ) β the successor to v1 (βHydrogenβ) β turns the original command-line BibexPy into a local web platform, shipped as a single pip-installable package. It merges, filters, harmonizes, enriches and exports Web of Science + Scopus records with full provenance, and never sends your licensed exports off your machine.
| What it is | Best for | |
|---|---|---|
| Full app (this repo) | pip install bibexpy β a local web platform: merge β filter β harmonize β enrich β export β report. |
The complete, reproducible workflow on your own machine. |
| BibexPy-Lite | A lightweight notebook/terminal tool that runs the same Smart Merge algorithm β no web UI, no enrichment. | A quick WoS + Scopus merge in Google Colab or a terminal. |
Both share one merge algorithm, so results are identical.
pip install bibexpy # macOS / Linux: pip3 install bibexpy
python -m bibexpy # macOS / Linux: python3 -m bibexpy (browser opens automatically)macOS / Linux: on most systems the commands are
python3/pip3β plainpython/pipmay not exist (or may point to an old Python 2). Ifpip3itself is missing, install it first:python3 -m ensurepip --upgrade(Debian/Ubuntu:sudo apt install python3-pip). On Windows it is usuallypython/pip.
python -m bibexpy is the recommended way to start the app β it works on every setup
out of the box, with no PATH configuration. The short bibexpy command works too once your
Python Scripts folder is on PATH β see
Add bibexpy to PATH (Windows) below.
Requires only Python 3.10+ β no Node.js/npm needed (the Next.js UI ships precompiled inside the wheel). Works on Windows, macOS and Linux.
python -m bibexpy --port 8080 # custom port
python -m bibexpy --no-browser # server only
python -m bibexpy --storage ./data # custom storage folder
python -m bibexpy --version # β BibexPy 2.0.x (Helium)(The short bibexpy command accepts exactly the same options.)
Projects/data live under ~/.bibexpy/storage; settings and API keys under ~/.bibexpy/.env
(managed from the in-app Settings page).
pip installs a bibexpy.exe launcher into your Python Scripts folder. With Microsoft
Store Python or pip install --user, that folder is usually not on PATH, so PowerShell
replies bibexpy : The term 'bibexpy' is not recognizedβ¦. Nothing is broken β
python -m bibexpy always works. To enable the short command as well:
-
Easiest β start the app once with
python -m bibexpy: it detects the situation and offers to add itself to PATH β answer Y, open a new terminal, done. (In non-interactive shells it prints a personalized copy-paste command instead; you can also force it withpython -m bibexpy --add-path.) -
Manual β paste this into PowerShell, then open a new terminal:
$s = python -c "import sysconfig, os; c=[sysconfig.get_path('scripts','nt_user'), sysconfig.get_path('scripts')]; print(next((p for p in c if 'WindowsApps' not in p and os.path.exists(os.path.join(p,'bibexpy.exe'))), c[0]))" [Environment]::SetEnvironmentVariable("Path", [Environment]::GetEnvironmentVariable("Path","User") + ";$s", "User")
-
Or use pipx β
pipx install bibexpymanages PATH for you.
On macOS/Linux the bibexpy command is normally on PATH right after pip install.
- Built-in sample dataset β the first launch creates a ready-to-explore Simple Project (real Web of Science + Scopus exports), so you can try the whole pipeline before uploading your own data.
- One-click Smart Merge β staged record linkage with a DOI-determinative rule (records whose normalized DOIs differ are never merged), identifier matching, and JaroβWinkler title similarity with confidence scoring, plus field-level merging. Pairs it cannot resolve with certainty are kept separate and offered for an optional review right in the merge step. The result includes a copy-ready methodology paragraph.
- ORCID-first author disambiguation β ORCID identifiers as deterministic evidence, with a constrained field-similarity fallback only when coverage is incomplete.
- Address harmonization β organization roll-up to a canonical parent institution + country standardization.
- Multi-source enrichment β fetch-once-fill-all across CrossRef, OpenAlex, Scopus, DataCite, Unpaywall, Europe PMC and Semantic Scholar; reverse-DOI recovery; identity fields (ORCID/ROR). Verifiable sources only β no ML-inferred metadata.
- Reproducible filtering β multi-facet inclusion/exclusion criteria, saved as reusable presets.
- Quality dashboard β a bibliometrically weighted health score + an exportable General Overview table (CSV / XLSX / PNG).
- Provenance β append-only audit log, pre-operation snapshots, isolated analyses, and an auto-generated methodology narrative for your paper's data-preparation section.
- Structured export β WoS plain text, VOSviewer TSV, BibTeX, RIS, CSV, TSV, XLSX (interoperable with VOSviewer & Biblioshiny).
A guided five-step pipeline:
Data & Merge β Records & Filtering β Harmonization β Export β Report
Raw Scopus (.csv) and Web of Science (.txt) exports are uploaded and merged in a single
click (file preparation runs implicitly), and each merge is stored as an isolated,
reproducible analysis.
apps/
web/ # Next.js 14 frontend (static-exported into the wheel)
api/ # FastAPI backend (documented HTTP API under /api)
packages/
bibex_core/ # core bibliometric library (converters, merge, C1 utils, β¦)
python_pkg/ # PyPI packaging β builds the single pip wheel
scripts/ # build_wheel.sh / build_wheel.ps1
.github/workflows/ # release.yml β tag β wheel β PyPI (Trusted Publishing)
The published wheel bundles the prebuilt frontend (_web) + backend (_server) +
vendored core, so end users need no Node.js. Those build-time copies are git-ignored β the
source of truth is apps/ and packages/, and the wheel is regenerated by CI.
# Backend (port 8001)
cd apps/api
pip install -r requirements.txt
pip install -e ../../packages/bibex_core
uvicorn main:app --reload --port 8001
# Frontend (port 3000) β separate terminal
cd apps/web
npm install
npm run devThen open http://localhost:3000. Run tests with cd apps/api && python -m pytest -q.
bash scripts/build_wheel.sh # macOS / Linux
pwsh scripts/build_wheel.ps1 # Windowsβ python_pkg/dist/bibexpy-<version>-py3-none-any.whl β a pure-python py3-none-any wheel
that installs on Windows / macOS / Linux with no compiler.
Tagging a version triggers GitHub Actions, which builds the wheel (with Node) and publishes it to PyPI via Trusted Publishing (OIDC) β no API tokens stored:
git tag v2.0.0 && git push origin v2.0.0Website Β· Docs Β· YouTube Β· X / Twitter Β· Instagram Β· Paper (SoftwareX)
If you use BibexPy in your research, please cite:
Kara, B. C., Εahin, A., & Dirsehan, T. (2025). BibexPy: Harmonizing the bibliometric symphony of Scopus and Web of Science. SoftwareX, 30, 102098. https://doi.org/10.1016/j.softx.2025.102098
@article{bibexpy2025,
title = {BibexPy: Harmonizing the bibliometric symphony of {Scopus} and {Web of Science}},
author = {Kara, Burak Can and {\c{S}}ahin, Alperen and Dirsehan, Ta{\c{s}}k{\i}n},
journal = {SoftwareX},
volume = {30},
pages = {102098},
year = {2025},
doi = {10.1016/j.softx.2025.102098}
}
