FORMULA-X

Design of Experiments + Machine Learning for formulation optimization.

Why FORMULA-X exists

Classical DoE software (Design-Expert, Minitab, JMP) gives you a quadratic response-surface model and a Derringer-Suich desirability score. That is useful, but it is also exactly what every formulation paper has done for thirty years. FORMULA-X is built around six analyses those tools structurally cannot deliver:

1. Bayesian optimization

Suggests the next experiment that most reduces uncertainty about the optimum, using a Gaussian Process surrogate. Saves experiments, framed as "AI-guided formulation".

2. Probabilistic design space

Pr(meeting specs) at every factor combination via Monte Carlo. Aligns with ICH Q8 / FDA Quality by Design (QbD).

3. Multi-objective Pareto

NSGA-II returns the full Pareto front across particle size, PDI, zeta potential, cost, etc. - no arbitrary weights forced on the formulator.

4. RSM vs ML, honestly

Quadratic RSM, Gaussian Process, and gradient-boosted trees compared with nested k-fold cross-validation. Either result is publishable.

5. Robust optimization

Finds optima least sensitive to small perturbations in factors - closer to manufacturing reality than a single point optimum.

6. Constraint-aware

Linear, non-linear, mixture-sum, and cost constraints handled natively - factorial designs encode these poorly.

How FORMULA-X compares to existing tools

Honest matrix vs the four most-used DoE tools in pharma and process development. = built-in, partial = available but limited, = not supported, manual = possible only by writing custom code outside the tool.

Capability	FORMULA-X	Design-Expert (Stat-Ease)	Minitab	JMP / JMP Pro (SAS)	pyDOE3 / Python scripts
Design generation
Box-Behnken, CCD, full / fractional factorial, Plackett-Burman
D-optimal (coordinate exchange)					partial (libraries only)
Mixture / simplex-lattice designs		✓ (industry-leading)			partial
Latin-hypercube space-filling		partial	partial
Modeling
Quadratic RSM (OLS) with full ANOVA + p-values					✓ (statsmodels)
Honest k-fold cross-validation, predictive Q²		partial (PRESS)	partial (PRESS)	partial	manual
Gaussian-Process surrogate (with predictive std)				✓ (JMP Pro)	manual (sklearn / GPyTorch)
Gradient-boosted-trees surrogate			partial (XGBoost add-on)	✓ (JMP Pro)	manual (XGBoost / LightGBM)
RSM-vs-ML ensemble comparison with honest nested CV winner selection				partial (manual)	manual
Optimization
Derringer-Suich desirability					manual
Multi-objective Pareto front (NSGA-II) with crowding		✗ (weighted-sum desirability only)	✗ (weighted only)	✗ (weighted only)	manual (pymoo)
Probabilistic design space, ICH Q8 / FDA QbD		partial (deterministic contour)		partial	manual
Bayesian Optimization (lab-in-the-loop, Expected Improvement on desirability)				✓ (JMP Pro 17+)	manual (BoTorch / scikit-optimize)
Robust optimization under input noise		partial (Propagation of Error)	partial		manual
Constraint-aware (linear, non-linear, sympy expressions)		partial (linear / mixture only)	partial (linear only)		manual
Diagnostics & explainability
Replicate variance auto-flagged on upload (with pooled pure-error std)		partial (post-fit)	partial	partial	manual
Duplicate-column / data-hygiene warnings on upload					manual
Residual diagnostics: vs predicted, Q-Q, histogram, vs run order					manual (matplotlib)
Permutation importance (model-agnostic factor ranking)				partial	manual (sklearn)
Partial-dependence plot (true PDP, not midpoint shortcut)				partial	manual
GP predictive-uncertainty heatmap				✓ (JMP Pro)	manual
Visualisation
3-D surface family: smooth, wireframe, 3-D contours, filled contours, surface + data overlay, 3-D scatter	✓ (6 styles)	partial (smooth + contour)	partial	✓ (smooth + wireframe)	manual
2-D contour map (filled + iso-lines)					manual
Bayesian-optimization Expected-Improvement landscape				partial (JMP Pro)	manual
Pareto parallel-coordinates + scatter matrix (n_responses ≥ 3)				partial	manual
26 server-rendered figures (Agg backend, no display required)					manual
Platform & integration
Web UI, multi-user, browser-based		✗ (desktop)	✗ (desktop)	partial (Live offers a web add-on)
REST API + Celery queue for automation				partial (JSL scripting)	manual
PDF report + ZIP CSV bundle exports		partial (Word / PDF)			manual
Integrated with the wider InsilicoΣ stack (QSAR-X, ADMET-X, RNA-Σ, Clinical ML, etc.)
Open-source / transparent codebase
Pricing	Free (academic / member access)	Commercial (~$2-5K / seat)	Commercial (~$1.5K / yr)	Commercial (~$15K / seat for JMP Pro)	Free

Pick FORMULA-X when you want

An ICH Q8 / QbD probabilistic design space, not just a deterministic contour.
Bayesian-optimization suggestions to reduce the number of lab runs.
An honest RSM-vs-ML comparison instead of trusting the quadratic by default.
A multi-objective Pareto front without forcing arbitrary desirability weights.
A web-based, scriptable, automation-friendly stack alongside QSAR-X / ADMET-X.
Open-source code you can audit, reproduce, and cite.

Pick another tool when

You need GMP / 21-CFR-11 audit trails out of the box - Design-Expert and JMP have decades of regulatory validation; FORMULA-X is research-grade.
Your work depends on a specialised mixture-design feature only Stat-Ease maintains (e.g. process-mixture combined designs).
You're embedded in a SAS / JMP shop and switching costs outweigh feature gains.
You only need design generation and you are happy in raw Python (pyDOE3 is enough).

What goes in

A Box-Behnken / CCD / factorial / D-optimal / Latin-hypercube / mixture design - generated by FORMULA-X or uploaded as CSV.
Factor and response definitions (units, bounds, optimization direction, ICH-style specs).
Optional constraints expressed in plain symbolic form, e.g. lecithin + cosurfactant <= 320.

What comes out

Trained surrogate model(s) per response with honest CV metrics.
Pareto front of non-dominated formulations.
Probabilistic design-space heatmap (ICH Q8 ready).
Bayesian-optimization suggestions for the next experiments.
PDF report, CSV bundle, and PNG/SVG figures for publication.

FORMULA-X is part of the InsilicoΣ platform. To request access, see the FAQ or contact the maintainer.