Skip to content

bcankara/BibexPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BibexPy β€” v2 Helium β€” Bibliometrics Experience with Python

Self-hosted, reproducible bibliometric data preparation for Web of Science & Scopus.

PyPI Python License: GPL-3.0 DOI

🌐 bibexpy.com Β· πŸ“š Documentation Β· πŸ“„ Paper (SoftwareX) Β· ▢️ YouTube


BibexPy v2 (β€œHelium”) β€” the successor to v1 (β€œHydrogen”) β€” turns the original command-line BibexPy into a local web platform, shipped as a single pip-installable package. It merges, filters, harmonizes, enriches and exports Web of Science + Scopus records with full provenance, and never sends your licensed exports off your machine.

BibexPy v2 home screen

Two ways to use BibexPy

What it is Best for
Full app (this repo) pip install bibexpy β†’ a local web platform: merge β†’ filter β†’ harmonize β†’ enrich β†’ export β†’ report. The complete, reproducible workflow on your own machine.
BibexPy-Lite A lightweight notebook/terminal tool that runs the same Smart Merge algorithm β€” no web UI, no enrichment. A quick WoS + Scopus merge in Google Colab or a terminal.

Both share one merge algorithm, so results are identical.

BibexPy-Lite
BibexPy-Lite repo Β· Open in Colab

Install

pip install bibexpy    # macOS / Linux: pip3 install bibexpy
python -m bibexpy      # macOS / Linux: python3 -m bibexpy   (browser opens automatically)

macOS / Linux: on most systems the commands are python3 / pip3 β€” plain python/pip may not exist (or may point to an old Python 2). If pip3 itself is missing, install it first: python3 -m ensurepip --upgrade (Debian/Ubuntu: sudo apt install python3-pip). On Windows it is usually python / pip.

python -m bibexpy is the recommended way to start the app β€” it works on every setup out of the box, with no PATH configuration. The short bibexpy command works too once your Python Scripts folder is on PATH β€” see Add bibexpy to PATH (Windows) below.

Requires only Python 3.10+ β€” no Node.js/npm needed (the Next.js UI ships precompiled inside the wheel). Works on Windows, macOS and Linux.

python -m bibexpy --port 8080        # custom port
python -m bibexpy --no-browser       # server only
python -m bibexpy --storage ./data   # custom storage folder
python -m bibexpy --version          # β†’ BibexPy 2.0.x (Helium)

(The short bibexpy command accepts exactly the same options.)

Projects/data live under ~/.bibexpy/storage; settings and API keys under ~/.bibexpy/.env (managed from the in-app Settings page).

Add bibexpy to PATH (Windows)

pip installs a bibexpy.exe launcher into your Python Scripts folder. With Microsoft Store Python or pip install --user, that folder is usually not on PATH, so PowerShell replies bibexpy : The term 'bibexpy' is not recognized…. Nothing is broken β€” python -m bibexpy always works. To enable the short command as well:

  • Easiest β€” start the app once with python -m bibexpy: it detects the situation and offers to add itself to PATH β€” answer Y, open a new terminal, done. (In non-interactive shells it prints a personalized copy-paste command instead; you can also force it with python -m bibexpy --add-path.)

  • Manual β€” paste this into PowerShell, then open a new terminal:

    $s = python -c "import sysconfig, os; c=[sysconfig.get_path('scripts','nt_user'), sysconfig.get_path('scripts')]; print(next((p for p in c if 'WindowsApps' not in p and os.path.exists(os.path.join(p,'bibexpy.exe'))), c[0]))"
    [Environment]::SetEnvironmentVariable("Path", [Environment]::GetEnvironmentVariable("Path","User") + ";$s", "User")
  • Or use pipx β€” pipx install bibexpy manages PATH for you.

On macOS/Linux the bibexpy command is normally on PATH right after pip install.

What's new in v2

  • Built-in sample dataset β€” the first launch creates a ready-to-explore Simple Project (real Web of Science + Scopus exports), so you can try the whole pipeline before uploading your own data.
  • One-click Smart Merge β€” staged record linkage with a DOI-determinative rule (records whose normalized DOIs differ are never merged), identifier matching, and Jaro–Winkler title similarity with confidence scoring, plus field-level merging. Pairs it cannot resolve with certainty are kept separate and offered for an optional review right in the merge step. The result includes a copy-ready methodology paragraph.
  • ORCID-first author disambiguation β€” ORCID identifiers as deterministic evidence, with a constrained field-similarity fallback only when coverage is incomplete.
  • Address harmonization β€” organization roll-up to a canonical parent institution + country standardization.
  • Multi-source enrichment β€” fetch-once-fill-all across CrossRef, OpenAlex, Scopus, DataCite, Unpaywall, Europe PMC and Semantic Scholar; reverse-DOI recovery; identity fields (ORCID/ROR). Verifiable sources only β€” no ML-inferred metadata.
  • Reproducible filtering β€” multi-facet inclusion/exclusion criteria, saved as reusable presets.
  • Quality dashboard β€” a bibliometrically weighted health score + an exportable General Overview table (CSV / XLSX / PNG).
  • Provenance β€” append-only audit log, pre-operation snapshots, isolated analyses, and an auto-generated methodology narrative for your paper's data-preparation section.
  • Structured export β€” WoS plain text, VOSviewer TSV, BibTeX, RIS, CSV, TSV, XLSX (interoperable with VOSviewer & Biblioshiny).

Workflow

A guided five-step pipeline:

Data & Merge β†’ Records & Filtering β†’ Harmonization β†’ Export β†’ Report

Raw Scopus (.csv) and Web of Science (.txt) exports are uploaded and merged in a single click (file preparation runs implicitly), and each merge is stored as an isolated, reproducible analysis.

Repository layout

apps/
  web/              # Next.js 14 frontend (static-exported into the wheel)
  api/              # FastAPI backend (documented HTTP API under /api)
packages/
  bibex_core/       # core bibliometric library (converters, merge, C1 utils, …)
python_pkg/         # PyPI packaging β€” builds the single pip wheel
scripts/            # build_wheel.sh / build_wheel.ps1
.github/workflows/  # release.yml β€” tag β†’ wheel β†’ PyPI (Trusted Publishing)

The published wheel bundles the prebuilt frontend (_web) + backend (_server) + vendored core, so end users need no Node.js. Those build-time copies are git-ignored β€” the source of truth is apps/ and packages/, and the wheel is regenerated by CI.

Development

# Backend (port 8001)
cd apps/api
pip install -r requirements.txt
pip install -e ../../packages/bibex_core
uvicorn main:app --reload --port 8001

# Frontend (port 3000) β€” separate terminal
cd apps/web
npm install
npm run dev

Then open http://localhost:3000. Run tests with cd apps/api && python -m pytest -q.

Build the wheel (maintainers)

bash scripts/build_wheel.sh      # macOS / Linux
pwsh scripts/build_wheel.ps1     # Windows

β†’ python_pkg/dist/bibexpy-<version>-py3-none-any.whl β€” a pure-python py3-none-any wheel that installs on Windows / macOS / Linux with no compiler.

Release

Tagging a version triggers GitHub Actions, which builds the wheel (with Node) and publishes it to PyPI via Trusted Publishing (OIDC) β€” no API tokens stored:

git tag v2.0.0 && git push origin v2.0.0

Links

Website Β· Docs Β· YouTube Β· X / Twitter Β· Instagram Β· Paper (SoftwareX)

License

GPL-3.0-or-later.

Citation

If you use BibexPy in your research, please cite:

Kara, B. C., Şahin, A., & Dirsehan, T. (2025). BibexPy: Harmonizing the bibliometric symphony of Scopus and Web of Science. SoftwareX, 30, 102098. https://doi.org/10.1016/j.softx.2025.102098

@article{bibexpy2025,
  title   = {BibexPy: Harmonizing the bibliometric symphony of {Scopus} and {Web of Science}},
  author  = {Kara, Burak Can and {\c{S}}ahin, Alperen and Dirsehan, Ta{\c{s}}k{\i}n},
  journal = {SoftwareX},
  volume  = {30},
  pages   = {102098},
  year    = {2025},
  doi     = {10.1016/j.softx.2025.102098}
}

About

BibexPy is a Python-based software designed to streamline bibliometric data integration, deduplication, metadata enrichment, and format conversion. It simplifies the preparation of high-quality datasets for advanced analyses by automating traditionally manual and error-prone tasks.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors