pip install eqhunt
Symbolic regression by genetic programming. C++ engine, Python bindings via nanobind.
Give it a table of (inputs, target) pairs; it returns a human-readable formula
that approximates the relationship. No neural network, no black box — just an
algebraic expression you can read, paste into a calculator, or hand-tune.
import eqhunt
X = [[1, 1], [2, 3], [4, 5], [7, 2], [9, 9]]
y = [2, 5, 9, 9, 18]
model = eqhunt.fit(X, y)
print(model.formula) # e.g. f(x,y) = (x+y)
print(model.error) # e.g 0.0
print(model.predict([6, 7])) # -> 13.0pip install eqhuntPrebuilt wheels are published for Linux, macOS and Windows on common Python versions. If pip falls back to building from source you'll need a C++17 compiler.
import eqhunt
model = eqhunt.fit(X, y, generations=5000)
print(model.formula)
model.predict([1, 2]) # single row
model.predict([[1, 2], [3, 4]]) # batchfit() accepts any Config field as a keyword argument:
eqhunt.fit(X, y, pop=800, trig_penalty=2.0, bloat_penalty=0.3)import eqhunt
cfg = eqhunt.Config()
cfg.pop = 800
cfg.gen = 50000
cfg.tournament_size = 5
cfg.initial_depth = 5
cfg.bloat_penalty = 0.3
cfg.trig_penalty = 1.5
cfg.accepted_error = 0.01
# Re-weight individual operators (higher = more likely to appear)
cfg.op_weights.sin = 1.0 # boost sine
cfg.op_weights.cos = 1.0
cfg.op_weights.exp = 0.0 # disable exp entirely
cfg.pi_prob = 0.10 # 'pi' more frequent in terminals
model = eqhunt.Model(cfg).fit(X, y)
print(model.formula)You can also train from a CSV file (one row per sample, last column = target,
lines starting with # are comments):
eqhunt.Model().fit_csv("nivel_embase.csv")| Category | Operators |
|---|---|
| Arithmetic | + - * / -x |
| Powers | sqrt ** |
| Conditional | if(cond, then, else) (cond > 0) |
| Trig | sin cos tan |
| Exp / log | exp log |
| Constants | numeric literals, pi |
Trigonometric, log and exp nodes have low default weights so they only
appear after enough mutation pressure — useful for cyclic / physical data,
ignored otherwise. Adjust via Config.op_weights.
- Per-sample error is
|prediction - target|; total error is the sum. - Invalid evaluations (
/0,sqrt(<0),log(<=0),exp(huge)) get a soft per-sample penalty rather than killing the whole formula — a single out-of-domain sample no longer disqualifies an otherwise good candidate. If more than 25% of samples fail, the formula is rejected.
Config.accepted_error stops the search as soon as total error drops below
the threshold. You can also call model.stop() from another thread (or a
signal handler) to ask the loop to wrap up after the current generation.
A trained model is just a string — you can persist it, ship it, paste it,
diff it. To reuse a formula in a new process without retraining, parse it
back into a Model:
import eqhunt
# train and save
m = eqhunt.fit(X, y)
print(m.formula) # e.g. f(x,y) = ((x*x) - (y*y))
m.save("model.txt") # one-liner persisted
# later, in a fresh process — no training needed
m2 = eqhunt.Model.load_file("model.txt")
m2.predict([6, 7]) # -13.0
m2.predict([[1, 2], [3, 4]])You can also go through strings directly:
formula_str = m.formula # or any equivalent expression
m3 = eqhunt.Model.from_formula(formula_str)
m3.predict([12, 5])Or mutate an existing model in place:
m.load_formula("(x*x + y*y)") # replaces the current treeAccepted syntax: anything the engine itself emits via get_formula() —
arithmetic (+ - * / **), unary minus, sqrt sin cos tan log exp if,
variables x y z w v u x6 x7 …, numeric literals (int / float / 1e5),
and pi. Both the bare expression ("(x+y)") and the full prefixed form
("f(x,y) = (x+y)") are accepted; the parser strips everything up to and
including the first =. Parse errors raise RuntimeError.
The number of input variables is inferred from the highest variable index
in the formula, so m2.num_vars is set correctly without needing to know
it in advance.
| Field | Default | Meaning |
|---|---|---|
pop |
400 | Population size |
gen |
15000 | Max generations |
tournament_size |
4 | Tournament selection pool |
crossover_prob |
0.7 | Crossover probability per pair |
mutation_prob |
0.25 | Mutation probability per offspring |
initial_depth |
4 | Depth used to seed the initial population |
mutation_depth |
3 | Depth for mutation-generated subtrees |
const_min/max |
-9, 9 | Range for random numeric terminals |
pi_prob |
0.01 | Probability a terminal is pi |
bloat_penalty |
0.1 | Per-node penalty (favours smaller trees) |
trig_penalty |
0.5 | Extra penalty per sin/cos/tan/log/exp node |
immigrant_rate |
0.05 | Fraction of population replaced by random each gen |
weak_parent_rate |
0.2 | Prob. 2nd parent is random (not tournament) |
accepted_error |
0.5 | Stop training once total error < this value |
verbose |
False* | Print best-so-far per improvement |
simplify |
True | Run algebraic simplification on the final tree |
simplify_interval |
500 | Periodically simplify top-N members during training |
simplify_top_n |
10 | How many to simplify periodically |
*C++ default is True; the Python fit() helper defaults to False.
git clone https://github.com/sha0coder/eqhunt
cd eqhunt
pip install -e .
pytestRequires Python 3.8+, a C++17 compiler, CMake 3.15+.
MIT.