Skip to content

feat(extraction): Elixir language support (.ex/.exs)#871

Open
allenwoods wants to merge 1 commit into
colbymchenry:mainfrom
allenwoods:feat/elixir-support
Open

feat(extraction): Elixir language support (.ex/.exs)#871
allenwoods wants to merge 1 commit into
colbymchenry:mainfrom
allenwoods:feat/elixir-support

Conversation

@allenwoods

Copy link
Copy Markdown

What

Adds Elixir (.ex / .exs) to CodeGraph's tree-sitter extraction. The tree-sitter-elixir.wasm grammar already ships in tree-sitter-wasms, so this wires it into WASM_GRAMMAR_FILES / EXTENSION_MAP and adds an extractor.

Why a custom approach

tree-sitter-elixir is metaprogramming-first — there are almost no dedicated declaration node types. defmodule, def, defp, alias, import, if, case … all parse as the same shape:

(call target: (identifier "<macro>") (arguments …) (do_block …)?)

So extraction runs through the visitNode hook and dispatches on the macro identifier instead of node types (similar in spirit to Ruby's module handling and Pascal's custom visitor). callTypes is empty because the core's extractCall keys off a function field Elixir lacks — the hook records calls itself.

Extracted

  • Modules (defmodule, nested) and protocols (defprotocolinterface)
  • Functionsdef / defp / defmacro / defmacrop / defguard / defguardp / defdelegate, with public/private visibility. Multi-clause definitions fold into a single symbol (same qualifiedName) so the resolver isn't left with duplicate nodes.
  • Dependenciesalias / import / require / use, including multi-alias alias A.{B, C} expansion into one import each
  • defimpl with an implements edge to the protocol
  • defstruct / defexception as struct nodes
  • Call edges — qualified Mod.fun and local calls, descending through control-flow special forms (if / case / with / for / pipes) while not recording those forms themselves as calls. Module attributes (@doc / @spec / …) are skipped in this pass.

Files

  • src/extraction/languages/elixir.ts — new extractor
  • src/extraction/grammars.ts — grammar + extension registration
  • src/extraction/languages/index.ts — register in EXTRACTORS
  • src/types.ts — add elixir to Language
  • __tests__/extraction.test.ts — 16 tests
  • README.md / CHANGELOG.md

Testing

  • 16 new unit tests: detection, modules (nested), functions + visibility, multi-clause folding, guards, do: shorthand, imports (incl. multi-alias), protocol/impl/struct/delegate, call edges
  • tsc --noEmit clean
  • Full suite: 1506 passed / 2 skipped, no regressions
  • Smoke-tested against real Phoenix/OTP source (1900+ LOC modules): correct module names, public/private visibility, and hundreds of call edges

Notes

  • The grammar binary is reused from tree-sitter-wasms — no vendored .wasm added (so it's not in the __dirname/wasm special-case list in loadGrammarsForLanguages).
  • I did not add an eval-corpus benchmark entry; happy to add one if you'd like the measured-coverage table updated.

🤖 Generated with Claude Code

Adds Elixir to CodeGraph's tree-sitter extraction. The grammar
(tree-sitter-elixir.wasm) already ships in tree-sitter-wasms; this wires it
up and adds the extractor.

tree-sitter-elixir is metaprogramming-first: defmodule, def, defp, alias,
import, if, case — everything parses as the same `(call target:(identifier)
(arguments) (do_block)?)` shape. So extraction runs through the visitNode
hook and dispatches on the macro identifier rather than node types.

Extracted:
- modules (defmodule, nested) and protocols (defprotocol -> interface)
- functions (def/defp/defmacro/defmacrop/defguard/defguardp/defdelegate) with
  public/private visibility; multi-clause defs fold into one symbol (same
  qualifiedName) so the resolver isn't left with duplicates
- dependencies (alias/import/require/use), including multi-alias
  `alias A.{B, C}` expansion
- defimpl with an `implements` edge to the protocol
- defstruct/defexception as struct nodes
- call edges for qualified `Mod.fun` and local calls, descending through
  control-flow special forms (if/case/with/for/...) while not recording them
  as calls; module attributes (@doc/@spec/...) are skipped

Tested against real Phoenix/OTP source (1900+ LOC modules) — correct module
names, visibility, and hundreds of call edges. 16 new unit tests; full suite
(1506 tests) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kotsutsumi

Copy link
Copy Markdown

+1 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants