InsilicoΣ
Drug Discovery, Cheminformatics & Bioinformatics
About Our Team Publications Contact Us Login Register

FORMULA-X

Design of Experiments + Machine Learning for formulation optimization.

Why FORMULA-X exists

Classical DoE software (Design-Expert, Minitab, JMP) gives you a quadratic response-surface model and a Derringer-Suich desirability score. That is useful, but it is also exactly what every formulation paper has done for thirty years. FORMULA-X is built around six analyses those tools structurally cannot deliver:

1. Bayesian optimization

Suggests the next experiment that most reduces uncertainty about the optimum, using a Gaussian Process surrogate. Saves experiments, framed as "AI-guided formulation".

2. Probabilistic design space

Pr(meeting specs) at every factor combination via Monte Carlo. Aligns with ICH Q8 / FDA Quality by Design (QbD).

3. Multi-objective Pareto

NSGA-II returns the full Pareto front across particle size, PDI, zeta potential, cost, etc. - no arbitrary weights forced on the formulator.

4. RSM vs ML, honestly

Quadratic RSM, Gaussian Process, and gradient-boosted trees compared with nested k-fold cross-validation. Either result is publishable.

5. Robust optimization

Finds optima least sensitive to small perturbations in factors - closer to manufacturing reality than a single point optimum.

6. Constraint-aware

Linear, non-linear, mixture-sum, and cost constraints handled natively - factorial designs encode these poorly.

How FORMULA-X compares to existing tools

Honest matrix vs the four most-used DoE tools in pharma and process development. = built-in, partial = available but limited, = not supported, manual = possible only by writing custom code outside the tool.

Capability FORMULA-X Design-Expert (Stat-Ease) Minitab JMP / JMP Pro (SAS) pyDOE3 / Python scripts
Design generation
Box-Behnken, CCD, full / fractional factorial, Plackett-Burman
D-optimal (coordinate exchange) partial (libraries only)
Mixture / simplex-lattice designs ✓ (industry-leading) partial
Latin-hypercube space-filling partial partial
Modeling
Quadratic RSM (OLS) with full ANOVA + p-values ✓ (statsmodels)
Honest k-fold cross-validation, predictive Q² partial (PRESS) partial (PRESS) partial manual
Gaussian-Process surrogate (with predictive std) ✓ (JMP Pro) manual (sklearn / GPyTorch)
Gradient-boosted-trees surrogate partial (XGBoost add-on) ✓ (JMP Pro) manual (XGBoost / LightGBM)
RSM-vs-ML ensemble comparison with honest nested CV winner selection partial (manual) manual
Optimization
Derringer-Suich desirability manual
Multi-objective Pareto front (NSGA-II) with crowding ✗ (weighted-sum desirability only) ✗ (weighted only) ✗ (weighted only) manual (pymoo)
Probabilistic design space, ICH Q8 / FDA QbD partial (deterministic contour) partial manual
Bayesian Optimization (lab-in-the-loop, Expected Improvement on desirability) ✓ (JMP Pro 17+) manual (BoTorch / scikit-optimize)
Robust optimization under input noise partial (Propagation of Error) partial manual
Constraint-aware (linear, non-linear, sympy expressions) partial (linear / mixture only) partial (linear only) manual
Diagnostics & explainability
Replicate variance auto-flagged on upload (with pooled pure-error std) partial (post-fit) partial partial manual
Duplicate-column / data-hygiene warnings on upload manual
Residual diagnostics: vs predicted, Q-Q, histogram, vs run order manual (matplotlib)
Permutation importance (model-agnostic factor ranking) partial manual (sklearn)
Partial-dependence plot (true PDP, not midpoint shortcut) partial manual
GP predictive-uncertainty heatmap ✓ (JMP Pro) manual
Visualisation
3-D surface family: smooth, wireframe, 3-D contours, filled contours, surface + data overlay, 3-D scatter ✓ (6 styles) partial (smooth + contour) partial ✓ (smooth + wireframe) manual
2-D contour map (filled + iso-lines) manual
Bayesian-optimization Expected-Improvement landscape partial (JMP Pro) manual
Pareto parallel-coordinates + scatter matrix (n_responses ≥ 3) partial manual
26 server-rendered figures (Agg backend, no display required) manual
Platform & integration
Web UI, multi-user, browser-based ✗ (desktop) ✗ (desktop) partial (Live offers a web add-on)
REST API + Celery queue for automation partial (JSL scripting) manual
PDF report + ZIP CSV bundle exports partial (Word / PDF) manual
Integrated with the wider InsilicoΣ stack (QSAR-X, ADMET-X, RNA-Σ, Clinical ML, etc.)
Open-source / transparent codebase
Pricing Free (academic / member access) Commercial (~$2-5K / seat) Commercial (~$1.5K / yr) Commercial (~$15K / seat for JMP Pro) Free
Pick FORMULA-X when you want
  • An ICH Q8 / QbD probabilistic design space, not just a deterministic contour.
  • Bayesian-optimization suggestions to reduce the number of lab runs.
  • An honest RSM-vs-ML comparison instead of trusting the quadratic by default.
  • A multi-objective Pareto front without forcing arbitrary desirability weights.
  • A web-based, scriptable, automation-friendly stack alongside QSAR-X / ADMET-X.
  • Open-source code you can audit, reproduce, and cite.
Pick another tool when
  • You need GMP / 21-CFR-11 audit trails out of the box - Design-Expert and JMP have decades of regulatory validation; FORMULA-X is research-grade.
  • Your work depends on a specialised mixture-design feature only Stat-Ease maintains (e.g. process-mixture combined designs).
  • You're embedded in a SAS / JMP shop and switching costs outweigh feature gains.
  • You only need design generation and you are happy in raw Python (pyDOE3 is enough).

What goes in

  • A Box-Behnken / CCD / factorial / D-optimal / Latin-hypercube / mixture design - generated by FORMULA-X or uploaded as CSV.
  • Factor and response definitions (units, bounds, optimization direction, ICH-style specs).
  • Optional constraints expressed in plain symbolic form, e.g. lecithin + cosurfactant <= 320.

What comes out

  • Trained surrogate model(s) per response with honest CV metrics.
  • Pareto front of non-dominated formulations.
  • Probabilistic design-space heatmap (ICH Q8 ready).
  • Bayesian-optimization suggestions for the next experiments.
  • PDF report, CSV bundle, and PNG/SVG figures for publication.

FORMULA-X is part of the InsilicoΣ platform. To request access, see the FAQ or contact the maintainer.

AI Lab