API Reference¶
Complete reference for all NeuroScope modules, classes, and functions.
Core Module¶
neuroscope¶
NeuroScope: A comprehensive neural network framework for learning and prototyping.
NeuroScope provides a clean, education-oriented interface for building and analyzing multi-layer perceptrons with advanced diagnostic capabilities. Designed for rapid experimentation with comprehensive monitoring and visualization tools.
- Core Components:
MLP: Modern multi-layer perceptron implementation
Diagnostics: Pre-training, training, and post-training analysis tools
Visualization: Publication-quality plotting and analysis
Example
>>> from neuroscope.mlp import MLP, mse, accuracy_binary, relu
>>> from neuroscope.diagnostics import PreTrainingAnalyzer, TrainingMonitor
>>> from neuroscope.viz import Visualizer
>>>
>>> # Create and train model
>>> model = MLP([784, 128, 10], activation="relu", out_activation="softmax")
>>> model.compile(optimizer="adam", lr=1e-3)
>>>
>>> # Analyze before training
>>> analyzer = PreTrainingAnalyzer(model)
>>> analyzer.analyze(X_train, y_train)
>>>
>>> # Ultra-fast training for production
>>> history = model.fit_fast(X_train, y_train, X_val, y_val, epochs=100, batch_size=256)
>>>
>>> # Or train with full diagnostics for research
>>> monitor = TrainingMonitor()
>>> history = model.fit(X_train, y_train, monitor=monitor, epochs=100)
>>>
>>> # Use functions directly
>>> loss = mse(y_true, y_pred)
>>> acc = accuracy_binary(y_true, y_pred)
>>>
>>> # Visualize results
>>> viz = Visualizer(history)
>>> viz.plot_learning_curves()
- class neuroscope.MLP(layer_dims, hidden_activation='leaky_relu', out_activation=None, init_method='smart', init_seed=42, dropout_rate=0.0, dropout_type='normal')[source]¶
Bases:
objectMulti-layer perceptron for quick prototyping and experimentation.
This MLP supports arbitrary layer sizes, multiple activation functions, and modern optimization techniques. Use compile to set hyperparameters and fit to train the model. Includes comprehensive training monitoring and diagnostic capabilities.
- Parameters:
layer_dims (
Sequence[int]) – Sizes of layers including input & output, e.g. [784, 128, 10].hidden_activation (
str, optional) – Activation function name for hidden layers. Options: “relu”, “leaky_relu”, “tanh”, “sigmoid”, “selu”. Defaults to “leaky_relu”.out_activation (
str, optional) – Output activation function. Options: “sigmoid” (binary), “softmax” (multiclass), None (regression). Defaults to None.init_method (
str, optional) – Weight initialization strategy. Options: “smart”, “he”, “xavier”, “random”, “selu_init”. Defaults to “smart”.init_seed (
int, optional) – Random seed for reproducible weight initialization. Defaults to 42.dropout_rate (
float, optional) – Dropout probability for hidden layers (0.0-1.0). Defaults to 0.0.dropout_type (
str, optional) – Dropout variant (“normal”, “alpha”). Defaults to “normal”.
- weights¶
Internal weight matrices for each layer.
- Type:
list[NDArray[np.float64]]
- biases¶
Internal bias vectors for each layer.
- Type:
list[NDArray[np.float64]]
Example
>>> from neuroscope.mlp import MLP >>> model = MLP([784, 128, 64, 10], activation="relu", out_activation="softmax") >>> model.compile(optimizer="adam", lr=1e-3) >>> history = model.fit(X_train, y_train, epochs=100) >>> predictions = model.predict(X_test)
- __init__(layer_dims, hidden_activation='leaky_relu', out_activation=None, init_method='smart', init_seed=42, dropout_rate=0.0, dropout_type='normal')[source]¶
- compile(optimizer='adam', lr=0.001, reg=None, lamda=0.01, gradient_clip=None)[source]¶
Configure the model for training.
Sets up the optimizer, learning rate, regularization, and other training hyperparameters. Must be called before training the model.
- Parameters:
optimizer (
str, optional) – Optimization algorithm. Options: “sgd”, “sgdm” (SGD with momentum), “sgdnm” (SGD with Nesterov momentum), “rmsprop”, “adam”. Defaults to “adam”.lr (
float, optional) – Learning rate for parameter updates. Defaults to 0.001.reg (
str, optional) – Regularization type (“l2”, None). Defaults to None.lamda (
float, optional) – Regularization strength (lambda parameter). Defaults to 0.01.gradient_clip (
float, optional) – Maximum gradient norm for clipping. Defaults to None.
- Raises:
ValueError – If invalid optimizer is specified.
Example
>>> model.compile(optimizer="adam", lr=1e-3, reg="l2", lamda=0.01) >>> model.compile(optimizer="sgdm", lr=0.01) # SGD with momentum >>> model.compile(optimizer="sgdnm", lr=0.01) # SGD with Nesterov momentum >>> model.compile(optimizer="rmsprop", lr=0.001) # RMSprop
- evaluate(X, y, metric='smart', binary_thresh=0.5)[source]¶
Evaluate model performance on given data.
Computes loss and evaluation metric on the provided dataset. Automatically selects appropriate loss function based on output activation.
- Parameters:
X (
NDArray[np.float64]) – Input data of shape (N, input_dim).y (
NDArray[np.float64]) – Target values of shape (N,) or (N, output_dim).metric (
str, optional) – Evaluation metric (“smart”, “accuracy”, “mse”, “rmse”, “mae”, “r2”, “f1”, “precision”, “recall”). Defaults to “smart”.binary_thresh (
float, optional) – Threshold for binary classification. Defaults to 0.5.
- Returns:
(loss, metric_score) where metric_score depends on the metric type.
- Return type:
Example
>>> loss, accuracy = model.evaluate(X_test, y_test, metric="accuracy") >>> print(f"Test Loss: {loss:.4f}, Accuracy: {accuracy:.2%}")
- fit(X_train, y_train, X_val=None, y_val=None, epochs=10, batch_size=32, verbose=True, log_every=5, early_stopping_patience=50, lr_decay=None, numerical_check_freq=100, metric='smart', reset_before_training=True, monitor=None, monitor_freq=1)[source]¶
Train the neural network on provided data.
Implements full training loop with support for validation, early stopping, learning rate decay, and comprehensive monitoring. Returns detailed training history and statistics for analysis.
- Parameters:
X_train (
NDArray[np.float64]) – Training input data of shape (N, input_dim).y_train (
NDArray[np.float64]) – Training targets of shape (N,) or (N, output_dim).X_val (
NDArray[np.float64], optional) – Validation input data. Defaults to None.y_val (
NDArray[np.float64], optional) – Validation targets. Defaults to None.epochs (
int, optional) – Number of training epochs. Defaults to 10.batch_size (
int, optional) – Mini-batch size. If None, uses full batch. Defaults to None.verbose (
bool, optional) – Whether to print training progress. Defaults to True.log_every (
int, optional) – Frequency of progress logging in epochs. Defaults to 1.early_stopping_patience (
int, optional) – Epochs to wait for improvement before stopping. Defaults to 50.lr_decay (
float, optional) – Learning rate decay factor per epoch. Defaults to None.numerical_check_freq (
int, optional) – Frequency of numerical stability checks. Defaults to 100.metric (
str, optional) – Evaluation metric for monitoring. Defaults to “smart”.reset_before_training (
bool, optional) – Whether to reset weights before training. Defaults to True.monitor (
TrainingMonitor, optional) – Real-time training monitor. Defaults to None.monitor_freq (
int, optional) – Monitoring frequency in epochs. Defaults to 1.
- Returns:
- Comprehensive training results containing:
weights: Final trained weight matrices
biases: Final trained bias vectors
history: Training/validation loss and metrics per epoch
activations: Sample activations from middle epoch
gradients: Sample gradients from middle epoch
weight_stats_over_epochs: Weight statistics evolution
activation_stats_over_epochs: Activation statistics evolution
gradient_stats_over_epochs: Gradient statistics evolution
- Return type:
- Raises:
ValueError – If model is not compiled or if input dimensions are incompatible.
Example
>>> history = model.fit(X_train, y_train, X_val, y_val, ... epochs=100, batch_size=32, ... early_stopping_patience=10) >>> print(f"Final training loss: {history['history']['train_loss'][-1]:.4f}")
- fit_batch(X_batch, y_batch, epochs=10, verbose=True, metric='smart')[source]¶
Train on a single batch for specified epochs. Uses 2-8 samples of given batch. .. note:
The range (2-8) samples is based on PyTorch implementation and literature such as blog of Karpathy (A Recipe for Training Neural Networks), Universal Approximation Theorem (Hornik et al., 1989), Empirical Risk Minimization (Vapnik, 1998) and others.
- fit_fast(X_train, y_train, X_val=None, y_val=None, epochs=10, batch_size=32, verbose=True, log_every=1, early_stopping_patience=50, lr_decay=None, numerical_check_freq=100, metric='smart', reset_before_training=True, eval_freq=5)[source]¶
High-performance training method optimized for fast training.
Ultra-fast training loop that eliminates statistics collection overhead and monitoring bottlenecks. Provides ~5-10× speedup over standard fit() while maintaining identical API and training quality.
Key Performance Optimizations: - Eliminates expensive statistics collection (main bottleneck) - Uses optimized batch processing with array views - Streamlined training loop with only essential operations - Configurable evaluation frequency to reduce overhead
Expected Performance: - ~5-10× faster than fit() method - 60-80% less memory usage
- Parameters:
X_train (
NDArray[np.float64]) – Training input data of shape (N, input_dim).y_train (
NDArray[np.float64]) – Training targets of shape (N,) or (N, output_dim).X_val (
NDArray[np.float64], optional) – Validation input data. Defaults to None.y_val (
NDArray[np.float64], optional) – Validation targets. Defaults to None.epochs (
int, optional) – Number of training epochs. Defaults to 10.batch_size (
int, optional) – Mini-batch size. If None, uses full batch. Defaults to None.verbose (
bool, optional) – Whether to print training progress. Defaults to True.log_every (
int, optional) – Frequency of progress logging in epochs. Defaults to 1.early_stopping_patience (
int, optional) – Epochs to wait for improvement before stopping. Defaults to 50.lr_decay (
float, optional) – Learning rate decay factor per epoch. Defaults to None.numerical_check_freq (
int, optional) – Frequency of numerical stability checks. Defaults to 100.metric (
str, optional) – Evaluation metric for monitoring. Defaults to “smart”.reset_before_training (
bool, optional) – Whether to reset weights before training. Defaults to True.monitor (
TrainingMonitor, optional) – Real-time training monitor. Defaults to None.monitor_freq (
int, optional) – Monitoring frequency in epochs. Defaults to 1.eval_freq (
int, optional) – Evaluation frequency in epochs for performance. Defaults to 5.
- Returns:
- Streamlined training results containing:
weights: Final trained weight matrices
biases: Final trained bias vectors
history: Training/validation loss and metrics per epoch
performance_stats: Training time and speed metrics
- Return type:
- Raises:
ValueError – If model is not compiled or if input dimensions are incompatible.
Example
>>> # Ultra-fast training >>> history = model.fit_fast(X_train, y_train, X_val, y_val, ... epochs=100, batch_size=256, eval_freq=5)
Note
For research and debugging with full diagnostics, use the standard fit() method. This method prioritizes speed over detailed monitoring capabilities.
- classmethod load(filepath: str, load_optimizer: bool = False)[source]¶
Load model from disk. (.ns format) :param filepath: Path to saved model file (.ns) :param load_optimizer: If True, loads optimizer state for training resume
- Returns:
(model, info) where: - model: Loaded MLP instance - info: ModelInfo dict with nice __repr__ for printing
- Return type:
Examples
>>> # Load and inspect metadata >>> model, info = MLP.load('checkpoint.ns') >>> print(info) # Shows nice formatted summary >>> predictions = model.predict(X_test)
>>> # Load for continued training >>> model, info = MLP.load('checkpoint.ns', load_optimizer=True) >>> print(f"Resuming from epoch {info['custom_metadata']['epoch']}") >>> model.fit(X, y, epochs=50)
>>> # Access metadata programmatically >>> model, info = MLP.load('model.ns') >>> print(f"Architecture: {info['model_config']['layer_dims']}") >>> print(f"Saved at: {info['timestamp']}")
- Raises:
FileNotFoundError – If file doesn’t exist
ValueError – If file format is invalid
- predict(X)[source]¶
Generate predictions for input samples.
Performs forward propagation through the network without dropout to generate predictions on new data.
- Parameters:
X (
NDArray[np.float64]) – Input data of shape (N, input_dim).- Returns:
- Model predictions of shape (N, output_dim).
For regression: continuous values. For binary classification: probabilities (0-1). For multiclass: class probabilities.
- Return type:
NDArray[np.float64]
Example
>>> predictions = model.predict(X_test) >>> binary_preds = (predictions > 0.5).astype(int) # For binary classification
- save(filepath: str, save_optimizer: bool = False, **metadata) None[source]¶
Save model to disk in NeuroScope format (.ns).
Saves model architecture, weights, and optionally optimizer state for resuming training. Uses pickle for efficient serialization.
- Parameters:
filepath – Path to save file (will add .ns extension if missing)
save_optimizer – If True, saves optimizer state for training resume
**metadata – Additional metadata to save (e.g., epoch, accuracy)
Examples
>>> # Basic save >>> model.save('my_model.ns')
>>> # Save with metadata >>> model.save('checkpoint.ns', epoch=50, accuracy=0.95)
>>> # Save without optimizer (inference only) >>> model.save('model.ns', save_optimizer=False)
Notes
File format: Python pickle (protocol 4)
Extension: .ns (NeuroScope)
Compatible with NumPy arrays
- class neuroscope.PreTrainingAnalyzer(model)[source]¶
Bases:
objectComprehensive pre-training diagnostic tool for neural networks.
Analyzes model architecture, weight initialization, and data compatibility before training begins. Implements research-validated checks to identify potential training issues early, based on established deep learning principles from Glorot & Bengio (2010), He et al. (2015), and others.
- Parameters:
model – Compiled MLP model instance with initialized weights.
- model¶
Reference to the neural network model.
Example
>>> from neuroscope.diagnostics import PreTrainingAnalyzer >>> model = MLP([784, 128, 10]) >>> model.compile(lr=1e-3) >>> analyzer = PreTrainingAnalyzer(model) >>> results = analyzer.analyze(X_train, y_train)
Initialize analyzer with a compiled model.
- analyze(X: ndarray, y: ndarray) None[source]¶
Comprehensive pre-training analysis with clean tabular output.
- analyze_architecture_sanity() Dict[str, Any][source]¶
Perform comprehensive architecture validation.
Validates network architecture against established deep learning principles and best practices. Checks for common architectural pitfalls such as incompatible activation functions, inappropriate depth, and problematic layer configurations based on research findings.
- Returns:
- Analysis results containing:
issues: List of critical architectural problems
warnings: List of potential concerns
status: Overall architecture quality (“PASS”, “WARN”, “FAIL”)
note: Summary diagnostic message
- Return type:
Note
Based on research from Bengio et al. (2009) on vanishing gradients, modern best practices for deep architectures, and activation function compatibility studies.
Example
>>> results = analyzer.analyze_architecture_sanity() >>> if results['issues']: ... print("Critical issues found:", results['issues'])
- analyze_capacity_data_ratio(X: ndarray, y: ndarray) Dict[str, Any][source]¶
Analyze parameter count relative to training data size.
- analyze_convergence_feasibility(X: ndarray, y: ndarray) Dict[str, Any][source]¶
Assess whether the current setup can theoretically converge.
- analyze_initial_loss(X: ndarray, y: ndarray) Dict[str, Any][source]¶
Validate initial loss against theoretical expectations.
Compares the model’s initial loss (before training) with theoretical baselines for different task types. For classification, expects loss near -log(1/num_classes). For regression, compares against variance-based baseline as described in Goodfellow et al. (2016).
- Parameters:
X (
NDArray[np.float64]) – Input data of shape (N, input_dim).y (
NDArray[np.float64]) – Target values of shape (N,) or (N, output_dim).
- Returns:
- Analysis results containing:
initial_loss: Computed initial loss value
expected_loss: Theoretical expected loss
ratio: initial_loss / expected_loss
task_type: Detected task type (regression/classification)
status: “PASS”, “WARN”, or “FAIL”
note: Diagnostic message
- Return type:
Example
>>> results = analyzer.analyze_initial_loss(X_train, y_train) >>> print(f"Initial loss check: {results['status']}")
- analyze_layer_capacity() Dict[str, Any][source]¶
Analyze information bottlenecks and layer capacity issues.
- analyze_weight_init() Dict[str, Any][source]¶
Validate weight initialization against theoretical optima.
Analyzes weight initialization quality by comparing actual weight standard deviations against theoretically optimal values for different activation functions. Based on He initialization (He et al. 2015) for ReLU variants and Xavier initialization (Glorot & Bengio 2010) for sigmoid/tanh.
- Returns:
- Analysis results containing:
layers: List of per-layer initialization analysis
status: Overall initialization quality (“PASS”, “WARN”, “FAIL”)
note: Summary diagnostic message
- Return type:
Example
>>> results = analyzer.analyze_weight_init() >>> for layer in results['layers']: ... print(f"Layer {layer['layer']}: {layer['status']}")
- class neuroscope.TrainingMonitor(model=None, history_size=50)[source]¶
Bases:
objectComprehensive real-time training monitoring system for neural networks.
Monitors 10 key training health indicators: - Dead ReLU neurons detection - Vanishing Gradient Problem (VGP) detection - Exploding Gradient Problem (EGP) detection - Weight health analysis - Learning progress - Overfitting detection - Gradient signal-to-noise ratio - Activation saturation detection (tanh/sigmoid) - Training plateau detection - Weight update vs magnitude ratios
Initialize comprehensive training monitor.
Sets up monitoring infrastructure for tracking 10 key training health indicators during neural network training. Uses research-validated thresholds and emoji-based status visualization.
- Parameters:
model – Optional MLP model instance (can be set later).
history_size (
int, optional) – Number of epochs to keep in rolling history for trend analysis. Defaults to 50.
Example
>>> monitor = TrainingMonitor(history_size=100) >>> results = model.fit(X, y, monitor=monitor)
- __init__(model=None, history_size=50)[source]¶
Initialize comprehensive training monitor.
Sets up monitoring infrastructure for tracking 10 key training health indicators during neural network training. Uses research-validated thresholds and emoji-based status visualization.
- Parameters:
model – Optional MLP model instance (can be set later).
history_size (
int, optional) – Number of epochs to keep in rolling history for trend analysis. Defaults to 50.
Example
>>> monitor = TrainingMonitor(history_size=100) >>> results = model.fit(X, y, monitor=monitor)
- monitor_activation_saturation(activations: List[ndarray], activation_functions: List[str] = None) Tuple[float, str][source]¶
Research-accurate activation saturation detection. Based on Glorot & Bengio (2010), Hochreiter (1991), and He et al. (2015). Key insights: - Saturation = extreme activation values + poor gradient flow + skewed distributions - Uses function-specific thresholds and statistical distribution analysis - Tracks saturation propagation through network layers :param activations: List of activation arrays from each layer :param activation_functions: List of activation function names for each layer
- Returns:
Tuple of (saturation_score, emoji_status)
- monitor_exploding_gradients(gradients: List[ndarray]) Tuple[float, str][source]¶
Detect exploding gradient problem using gradient norm analysis.
Monitors gradient magnitudes to detect exploding gradients based on research by Pascanu et al. (2013). Uses both global norm and per-layer analysis to identify unstable training dynamics.
- Parameters:
gradients (
list[NDArray[np.float64]]) – Gradient arrays for each layer.- Returns:
- (egp_severity, status_emoji) where:
egp_severity: Float in [0,1] indicating severity
status: 🟢 (stable), 🟡 (elevated), 🔴 (exploding)
- Return type:
Note
Based on “On the difficulty of training recurrent neural networks” (Pascanu et al. 2013) gradient clipping and norm analysis.
- monitor_gradient_snr(gradients: List[ndarray]) Tuple[float, str][source]¶
Calculate Gradient Signal-to-Noise Ratio (GSNR) for optimization health. - Signal: RMS gradient magnitude (update strength) - Noise: Coefficient of variation (relative inconsistency) - GSNR = RMS_magnitude / (std_magnitude + ε) This measures gradient update consistency. :param gradients: List of gradient arrays from each layer
- Returns:
Tuple of (gsnr_score, emoji_status)
- monitor_learning_progress(current_loss: float, val_loss: float | None = None) Tuple[float, str][source]¶
Research-accurate learning progress monitor. Based on optimization literature: Bottou (2010), Goodfellow et al. (2016), Smith (2017). Key insights: - Progress = consistent loss reduction + convergence stability + generalization health - Uses exponential moving averages and plateau detection from literature :param current_loss: Current training loss :param val_loss: Optional validation loss
- Returns:
Tuple of (progress_score, emoji_status)
- monitor_overfitting(train_loss: float, val_loss: float | None = None) Tuple[float, str][source]¶
Research-accurate overfitting detection. Based on Prechelt (1998), Goodfellow et al. (2016), and Caruana et al. (2001). Key insights: - Overfitting = increasing generalization gap + validation curve deterioration :param train_loss: Training loss :param val_loss: Validation loss
- Returns:
Tuple of (overfitting_score, emoji_status)
- monitor_plateau(current_loss: float, val_loss: float | None = None, current_gradients: List[ndarray] | None = None) Tuple[float, str][source]¶
Research-accurate training plateau detection. Based on Prechelt (1998), Bengio (2012), and Smith (2017). Key insights: - Plateau = statistical stagnation + loss of learning momentum + gradient analysis - Uses multi-scale analysis and statistical significance testing - Integrates validation correlation and gradient magnitude trends :param current_loss: Current training loss :param val_loss: Optional validation loss for correlation analysis :param current_gradients: Optional gradient arrays for gradient-based detection
- Returns:
Tuple of (plateau_score, emoji_status)
- monitor_relu_dead_neurons(activations: List[ndarray], activation_functions: List[str] | None = None) Tuple[float, str][source]¶
Monitor for dead ReLU neurons during training.
Detects neurons that have become inactive (always output zero) which indicates the “dying ReLU” problem. Uses activation-function-aware thresholds based on research by Glorot et al. (2011) and He et al. (2015).
Natural sparsity in ReLU networks is expected (~50%), but excessive sparsity (>90%) indicates dead neurons that cannot learn.
- Parameters:
activations (
list[NDArray[np.float64]]) – Layer activation outputs.activation_functions (
list[str], optional) – Activation function names per layer.
- Returns:
- (dead_percentage, status_emoji) where status is:
🟢: Healthy sparsity (<10% dead)
🟡: Moderate concern (10-30% dead)
🔴: Critical issue (>30% dead)
- Return type:
Note
Based on “Deep Sparse Rectifier Neural Networks” (Glorot et al. 2011) and “Delving Deep into Rectifiers” (He et al. 2015).
- monitor_step(epoch: int, train_loss: float, val_loss: float | None = None, activations: List[ndarray] | None = None, gradients: List[ndarray] | None = None, weights: List[ndarray] | None = None, weight_updates: List[ndarray] | None = None, activation_functions: List[str] | None = None) Dict[str, Any][source]¶
Perform one monitoring step and return all metrics. :param epoch: Current epoch number :param train_loss: Training loss :param val_loss: Validation loss (optional) :param activations: Layer activations (optional) :param gradients: Layer gradients (optional) :param weights: Layer weights (optional) :param weight_updates: Weight updates (optional) :param activation_functions: List of activation function names (optional)
- Returns:
Dictionary containing all monitoring results
- monitor_vanishing_gradients(gradients: List[ndarray]) Tuple[float, str][source]¶
Detect vanishing gradient problem using research-validated metrics.
Monitors gradient flow through the network to detect vanishing gradients based on variance analysis from Glorot & Bengio (2010). Healthy networks maintain similar gradient variance across layers.
- Parameters:
gradients (
list[NDArray[np.float64]]) – Gradient arrays for each layer.- Returns:
- (vgp_severity, status_emoji) where:
vgp_severity: Float in [0,1] indicating severity
status: 🟢 (healthy), 🟡 (warning), 🔴 (critical)
- Return type:
Note
Implementation based on “Understanding the difficulty of training deep feedforward neural networks” (Glorot & Bengio 2010).
- monitor_weight_health(weights: List[ndarray]) Tuple[float, str][source]¶
Simple, research-backed weight health monitor. Based on Glorot & Bengio (2010) and He et al. (2015) initialization theory. :param weights: List of weight matrices
- Returns:
Tuple of (health_score, status)
- monitor_weight_update_ratio(weights: List[ndarray], weight_updates: List[ndarray]) Tuple[float, str][source]¶
Monitor Weight Update to Weight magnitude Ratios (WUR) for learning rate validation. Research-based implementation using: - Smith (2015): Learning rate should produce WUR ~10^-3 to 10^-2 for stable training - Zeiler (2012): Update magnitude should be proportional to weight magnitude Formula: WUR = ||weight_update|| / ||weight|| per layer :param weights: Current weight matrices :param weight_updates: Weight update matrices (gradients * learning_rate)
- Returns:
Tuple of (median_wur, status)
- class neuroscope.PostTrainingEvaluator(model)[source]¶
Bases:
objectComprehensive post-training evaluation system for neural networks.
Provides thorough analysis of trained model performance including robustness testing, performance metrics evaluation, and diagnostic assessments. Designed to validate model quality and identify potential deployment issues after training completion.
- Parameters:
model – Trained and compiled MLP model instance with initialized weights.
- model¶
Reference to the trained neural network model.
Example
>>> from neuroscope.diagnostics import PostTrainingEvaluator >>> model = MLP([784, 128, 10]) >>> model.compile(lr=1e-3) >>> history = model.fit(X_train, y_train, epochs=100) >>> evaluator = PostTrainingEvaluator(model) >>> evaluator.evaluate(X_test, y_test) >>> # Access detailed results >>> robustness = evaluator.evaluate_robustness(X_test, y_test) >>> performance = evaluator.evaluate_performance(X_test, y_test)
Initialize evaluator with a trained model.
- evaluate(X_test: ndarray, y_test: ndarray, X_train: ndarray | None = None, y_train: ndarray | None = None)[source]¶
Run comprehensive model evaluation and generate summary report.
- evaluate_performance(X: ndarray, y: ndarray) Dict[str, Any][source]¶
Evaluate model performance metrics.
- class neuroscope.Visualizer(hist)[source]¶
Bases:
objectHigh quality visualization tool for neural network training analysis.
Provides comprehensive plotting capabilities for analyzing training dynamics, network behavior, and diagnostic information. Creates professional-grade figures suitable for research publications and presentations.
- Parameters:
hist (
dict) – Complete training history from model.fit() containing: - history: Training/validation metrics per epoch - weights/biases: Final network parameters - activations/gradients: Sample network internals - *_stats_over_epochs: Statistical evolution during training
- weights/biases
Final network parameters.
- activations/gradients
Representative network internals.
Example
>>> from neuroscope.viz import Visualizer >>> history = model.fit(X_train, y_train, epochs=100) >>> viz = Visualizer(history) >>> viz.plot_learning_curves() >>> viz.plot_activation_distribution() >>> viz.plot_gradient_flow()
Initialize visualizer with comprehensive training history.
Sets up visualization infrastructure and applies publication-quality styling to all plots. Automatically extracts relevant data components for different types of analysis.
- Parameters:
hist (
dict) – Training history from model.fit() containing all training statistics, network states, and diagnostic information.
- __init__(hist)[source]¶
Initialize visualizer with comprehensive training history.
Sets up visualization infrastructure and applies publication-quality styling to all plots. Automatically extracts relevant data components for different types of analysis.
- Parameters:
hist (
dict) – Training history from model.fit() containing all training statistics, network states, and diagnostic information.
- plot_activation_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Plot activation value distributions across network layers.
Visualizes the distribution of activation values for each layer at a specific epoch, aggregated from all mini-batches. Useful for detecting activation saturation, dead neurons, and distribution shifts during training.
- Parameters:
epoch (
int, optional) – Specific epoch to plot. If None, uses last epoch. Defaults to None.figsize (
tuple[int,int], optional) – Figure dimensions. Defaults to (9, 4).kde (
bool, optional) – Whether to use KDE-style smoothing for smoother curves. Defaults to False.last_layer (
bool, optional) – Whether to include output layer. Defaults to False.save_path (
str, optional) – Path to save the figure. Defaults to None.
Example
>>> viz.plot_activation_hist(epoch=50, kde=True, save_path='activations.png')
- plot_activation_stats(activation_stats=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot activation statistics over time with both mean and std.
- Parameters:
activation_stats – Dict of layer activation stats OR None to use class data Format: {‘layer_0’: {‘mean’: […], ‘std’: […]}, …}
figsize – Figure size tuple
save_path – Path to save figure
- plot_curves_fast(figsize=(10, 4), markers=True, save_path=None)[source]¶
Plot learning curves for fit_fast() results.
- plot_gradient_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Plot gradient value distributions across network layers.
Visualizes gradient distributions to detect vanishing/exploding gradient problems, gradient flow issues, and training stability. Shows zero-line reference for assessing gradient symmetry and magnitude.
- Parameters:
epoch (
int, optional) – Specific epoch to plot. If None, uses last epoch. Defaults to None.figsize (
tuple[int,int], optional) – Figure dimensions. Defaults to (9, 4).kde (
bool, optional) – Whether to use KDE-style smoothing. Defaults to False.last_layer (
bool, optional) – Whether to include output layer gradients. Defaults to False.save_path (
str, optional) – Path to save the figure. Defaults to None.
Note
Gradient distributions should be roughly symmetric around zero for healthy training. Very narrow distributions may indicate vanishing gradients, while very wide distributions may indicate exploding gradients.
Example
>>> viz.plot_gradient_hist(epoch=25, kde=True, save_path='gradients.png')
- plot_gradient_norms(gradient_norms=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot gradient norms per layer over epochs.
- plot_gradient_stats(figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot gradient statistics over time with both mean and std.
- plot_learning_curves(figsize=(9, 4), ci=False, markers=True, save_path=None, metric='accuracy')[source]¶
Plot training and validation learning curves for regular fit() results.
Creates highest quality subplot showing loss and metric evolution during training. Automatically detects available data and applies consistent styling with optional confidence intervals.
Note: For fit_fast() results, use plot_curves_fast() instead.
- Parameters:
figsize (
tuple[int,int], optional) – Figure dimensions (width, height). Defaults to (9, 4).ci (
bool, optional) – Whether to add confidence intervals using moving window statistics. Only available for regular fit() results. Defaults to False.markers (
bool, optional) – Whether to show markers on line plots. Defaults to True.save_path (
str, optional) – Path to save the figure. Defaults to None.metric (
str, optional) – Name of the metric for y-axis label. Defaults to ‘accuracy’.
Example
>>> viz.plot_learning_curves(figsize=(10, 5), ci=True, save_path='curves.png')
- plot_training_animation(bg='dark', save_path=None)[source]¶
Creates a comprehensive 4-panel GIF animation showing: 1. Loss curves over time 2. Accuracy curves over time 3. Current metrics bar chart 4. Gradient flow analysis Speed automatically adjusts based on epoch count for smooth motion feel. :param bg: Theme (‘dark’ or ‘light’) :param save_path: Path to save GIF (defaults to ‘mlp_training_animation.gif’)
- Returns:
Path to saved GIF file
- plot_update_ratios(update_ratios=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot weight update ratios per layer across epochs.
- Parameters:
update_ratios – Dict of layer update ratios (optional - uses collected data if None) Format: {‘layer_0’: [ratio_epoch_0, ratio_epoch_1, …], …}
figsize – Figure size tuple
save_path – Path to save figure
- plot_weight_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Uses aggregated samples from all mini-batches within an epoch to create representative distributions. Shows weight evolution patterns.
- Parameters:
epoch – Specific epoch to plot (default: last epoch)
figsize – Figure size tuple
kde – Whether to use KDE-style smoothing
last_layer – Whether to include output layer (default: False, hidden layers only)
save_path – Path to save figure
- neuroscope.PTA¶
alias of
PreTrainingAnalyzer
- neuroscope.TM¶
alias of
TrainingMonitor
- neuroscope.PTE¶
alias of
PostTrainingEvaluator
- neuroscope.VIZ¶
alias of
Visualizer
- neuroscope.mse(y_true, y_pred)¶
Compute mean squared error loss.
Standard regression loss function that penalizes squared differences between predictions and targets. Suitable for continuous target values.
- Parameters:
y_true (
NDArray[np.float64]) – Ground truth values of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]) – Predicted values of shape (N,) or (N, 1).
- Returns:
Mean squared error loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.mse(y_true, y_pred) >>> print(f"MSE Loss: {loss:.4f}")
- neuroscope.bce(y_true, y_pred, eps=1e-12)¶
Compute binary cross-entropy loss.
Standard loss function for binary classification problems. Applies numerical clipping to prevent log(0) errors and ensure stability.
- Parameters:
y_true (
NDArray[np.float64]) – Binary labels (0/1) of shape (N,).y_pred (
NDArray[np.float64]) – Predicted probabilities of shape (N,).eps (
float, optional) – Small value for numerical stability. Defaults to 1e-12.
- Returns:
Binary cross-entropy loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.bce(y_true, y_pred) >>> print(f"BCE Loss: {loss:.4f}")
- neuroscope.cce(y_true, y_pred, eps=1e-12)¶
Compute categorical cross-entropy loss.
Standard loss function for multi-class classification. Handles both sparse labels (class indices) and one-hot encoded targets.
- Parameters:
y_true (
NDArray[np.float64]) – Class labels of shape (N,) for sparse labels or (N, C) for one-hot encoded targets.y_pred (
NDArray[np.float64]) – Predicted class probabilities of shape (N, C).eps (
float, optional) – Small value for numerical stability. Defaults to 1e-12.
- Returns:
Categorical cross-entropy loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.cce(y_true, y_pred) >>> print(f"CCE Loss: {loss:.4f}")
- neuroscope.mse_with_reg(y_true, y_pred, weights, lamda=0.01)¶
- neuroscope.bce_with_reg(y_true, y_pred, weights, lamda=0.01, eps=1e-12)¶
- neuroscope.cce_with_reg(y_true, y_pred, weights, lamda=0.01, eps=1e-12)¶
- neuroscope.accuracy_binary(y_true, y_pred, thresh=0.5)¶
Compute binary classification accuracy.
Calculates the fraction of correctly predicted samples for binary classification by applying a threshold to predicted probabilities.
- Parameters:
y_true (
NDArray[np.float64]) – Binary labels (0/1) of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]) – Predicted probabilities of shape (N,) or (N, 1).thresh (
float, optional) – Classification threshold. Defaults to 0.5.
- Returns:
Binary classification accuracy as a fraction (0.0 to 1.0).
- Return type:
Example
>>> accuracy = Metrics.accuracy_binary(y_true, y_pred, thresh=0.5) >>> print(f"Binary Accuracy: {accuracy:.2%}")
- neuroscope.accuracy_multiclass(y_true, y_pred)¶
Compute multi-class classification accuracy.
Calculates the fraction of correctly predicted samples for multi-class classification problems. Handles both sparse labels and one-hot encoded inputs.
- Parameters:
y_true (
NDArray[np.float64]) – True class labels of shape (N,) for sparse labels or (N, C) for one-hot encoded.y_pred (
NDArray[np.float64]) – Predicted class probabilities of shape (N, C).
- Returns:
Classification accuracy as a fraction (0.0 to 1.0).
- Return type:
Example
>>> accuracy = Metrics.accuracy_multiclass(y_true, y_pred) >>> print(f"Accuracy: {accuracy:.2%}")
- neuroscope.rmse(y_true, y_pred)¶
- neuroscope.mae(y_true, y_pred)¶
- neuroscope.r2_score(y_true, y_pred)¶
Compute coefficient of determination (R² score).
Measures the proportion of variance in the dependent variable that is predictable from the independent variables. R² = 1 indicates perfect fit, R² = 0 indicates the model performs as well as predicting the mean.
- Parameters:
y_true (
NDArray[np.float64]) – Ground truth values of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]) – Predicted values of shape (N,) or (N, 1).
- Returns:
R² score (can be negative for very poor fits).
- Return type:
Example
>>> r2 = Metrics.r2_score(y_true, y_pred) >>> print(f"R² Score: {r2:.3f}")
- neuroscope.f1_score(y_true, y_pred, average='weighted', threshold=0.5)¶
Compute F1 score: 2 * (Precision * Recall) / (Precision + Recall)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
- neuroscope.precision(y_true, y_pred, average='weighted', threshold=0.5)¶
Compute precision score: TP / (TP + FP)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
- neuroscope.recall(y_true, y_pred, average='weighted', threshold=0.5)¶
Compute recall score: TP / (TP + FN)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
- neuroscope.relu(x)¶
Compute ReLU (Rectified Linear Unit) activation.
Applies the rectified linear activation function that outputs the input for positive values and zero for negative values. Most popular activation for hidden layers in modern neural networks.
- Parameters:
x (
NDArray[np.float64]) – Input array of any shape.- Returns:
ReLU-activated values (non-negative).
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.relu(z) >>> # Negative values become 0, positive values unchanged
- neuroscope.leaky_relu(x, negative_slope=0.01)¶
Compute Leaky ReLU activation function.
Variant of ReLU that allows small negative values to flow through, helping to mitigate the “dying ReLU” problem where neurons can become permanently inactive.
- Parameters:
x (
NDArray[np.float64]) – Input array of any shape.negative_slope (
float, optional) – Slope for negative values. Defaults to 0.01.
- Returns:
Leaky ReLU-activated values.
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.leaky_relu(z, negative_slope=0.01) >>> # Positive values unchanged, negative values scaled by 0.01
- neuroscope.sigmoid(x)¶
Compute sigmoid activation function.
Applies the logistic sigmoid function that maps input to (0, 1) range. Includes numerical clipping to prevent overflow in exponential computation.
- Parameters:
x (
NDArray[np.float64]) – Input array of any shape.- Returns:
Sigmoid-activated values in range (0, 1).
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.sigmoid(z) >>> # Values are now between 0 and 1
- neuroscope.tanh(x)¶
- neuroscope.selu(x)¶
- neuroscope.softmax(z)¶
- neuroscope.he_init(layer_dims: list, seed=42)¶
He initialization for ReLU and ReLU-variant activations.
Optimal for ReLU-based networks as derived in He et al. (2015). Uses standard deviation of sqrt(2/fan_in) to maintain proper variance propagation through ReLU activations.
- Parameters:
layer_dims (
list[int]) – Layer dimensions [input_dim, hidden_dim, …, output_dim].seed (
int, optional) – Random seed for reproducibility. Defaults to 42.
- Returns:
- (weights, biases) where weights are initialized
according to He initialization and biases are zero-initialized.
- Return type:
Note
Based on “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification” (He et al. 2015).
Example
>>> weights, biases = WeightInits.he_init([784, 128, 10])
- neuroscope.xavier_init(layer_dims: list, seed=42)¶
Xavier/Glorot initialization for sigmoid and tanh activations.
Optimal for symmetric activations like tanh and sigmoid. Uses standard deviation of sqrt(2/(fan_in + fan_out)) to maintain constant variance across layers.
- Parameters:
layer_dims (
list[int]) – Layer dimensions [input_dim, hidden_dim, …, output_dim].seed (
int, optional) – Random seed for reproducibility. Defaults to 42.
- Returns:
(weights, biases) with Xavier-initialized weights and zero biases.
- Return type:
Note
Based on “Understanding the difficulty of training deep feedforward neural networks” (Glorot & Bengio 2010).
Example
>>> weights, biases = WeightInits.xavier_init([784, 128, 10])
MLP Module¶
Multi-Layer Perceptron¶
MLP Neural Network Main neural network class integrating all framework components.
- class neuroscope.mlp.mlp.MLP(layer_dims, hidden_activation='leaky_relu', out_activation=None, init_method='smart', init_seed=42, dropout_rate=0.0, dropout_type='normal')[source]¶
Bases:
objectMulti-layer perceptron for quick prototyping and experimentation.
This MLP supports arbitrary layer sizes, multiple activation functions, and modern optimization techniques. Use compile to set hyperparameters and fit to train the model. Includes comprehensive training monitoring and diagnostic capabilities.
- Parameters:
layer_dims (
Sequence[int]) – Sizes of layers including input & output, e.g. [784, 128, 10].hidden_activation (
str, optional) – Activation function name for hidden layers. Options: “relu”, “leaky_relu”, “tanh”, “sigmoid”, “selu”. Defaults to “leaky_relu”.out_activation (
str, optional) – Output activation function. Options: “sigmoid” (binary), “softmax” (multiclass), None (regression). Defaults to None.init_method (
str, optional) – Weight initialization strategy. Options: “smart”, “he”, “xavier”, “random”, “selu_init”. Defaults to “smart”.init_seed (
int, optional) – Random seed for reproducible weight initialization. Defaults to 42.dropout_rate (
float, optional) – Dropout probability for hidden layers (0.0-1.0). Defaults to 0.0.dropout_type (
str, optional) – Dropout variant (“normal”, “alpha”). Defaults to “normal”.
- weights¶
Internal weight matrices for each layer.
- Type:
list[NDArray[np.float64]]
- biases¶
Internal bias vectors for each layer.
- Type:
list[NDArray[np.float64]]
Example
>>> from neuroscope.mlp import MLP >>> model = MLP([784, 128, 64, 10], activation="relu", out_activation="softmax") >>> model.compile(optimizer="adam", lr=1e-3) >>> history = model.fit(X_train, y_train, epochs=100) >>> predictions = model.predict(X_test)
- __init__(layer_dims, hidden_activation='leaky_relu', out_activation=None, init_method='smart', init_seed=42, dropout_rate=0.0, dropout_type='normal')[source]¶
- compile(optimizer='adam', lr=0.001, reg=None, lamda=0.01, gradient_clip=None)[source]¶
Configure the model for training.
Sets up the optimizer, learning rate, regularization, and other training hyperparameters. Must be called before training the model.
- Parameters:
optimizer (
str, optional) – Optimization algorithm. Options: “sgd”, “sgdm” (SGD with momentum), “sgdnm” (SGD with Nesterov momentum), “rmsprop”, “adam”. Defaults to “adam”.lr (
float, optional) – Learning rate for parameter updates. Defaults to 0.001.reg (
str, optional) – Regularization type (“l2”, None). Defaults to None.lamda (
float, optional) – Regularization strength (lambda parameter). Defaults to 0.01.gradient_clip (
float, optional) – Maximum gradient norm for clipping. Defaults to None.
- Raises:
ValueError – If invalid optimizer is specified.
Example
>>> model.compile(optimizer="adam", lr=1e-3, reg="l2", lamda=0.01) >>> model.compile(optimizer="sgdm", lr=0.01) # SGD with momentum >>> model.compile(optimizer="sgdnm", lr=0.01) # SGD with Nesterov momentum >>> model.compile(optimizer="rmsprop", lr=0.001) # RMSprop
- predict(X)[source]¶
Generate predictions for input samples.
Performs forward propagation through the network without dropout to generate predictions on new data.
- Parameters:
X (
NDArray[np.float64]) – Input data of shape (N, input_dim).- Returns:
- Model predictions of shape (N, output_dim).
For regression: continuous values. For binary classification: probabilities (0-1). For multiclass: class probabilities.
- Return type:
NDArray[np.float64]
Example
>>> predictions = model.predict(X_test) >>> binary_preds = (predictions > 0.5).astype(int) # For binary classification
- evaluate(X, y, metric='smart', binary_thresh=0.5)[source]¶
Evaluate model performance on given data.
Computes loss and evaluation metric on the provided dataset. Automatically selects appropriate loss function based on output activation.
- Parameters:
X (
NDArray[np.float64]) – Input data of shape (N, input_dim).y (
NDArray[np.float64]) – Target values of shape (N,) or (N, output_dim).metric (
str, optional) – Evaluation metric (“smart”, “accuracy”, “mse”, “rmse”, “mae”, “r2”, “f1”, “precision”, “recall”). Defaults to “smart”.binary_thresh (
float, optional) – Threshold for binary classification. Defaults to 0.5.
- Returns:
(loss, metric_score) where metric_score depends on the metric type.
- Return type:
Example
>>> loss, accuracy = model.evaluate(X_test, y_test, metric="accuracy") >>> print(f"Test Loss: {loss:.4f}, Accuracy: {accuracy:.2%}")
- fit(X_train, y_train, X_val=None, y_val=None, epochs=10, batch_size=32, verbose=True, log_every=5, early_stopping_patience=50, lr_decay=None, numerical_check_freq=100, metric='smart', reset_before_training=True, monitor=None, monitor_freq=1)[source]¶
Train the neural network on provided data.
Implements full training loop with support for validation, early stopping, learning rate decay, and comprehensive monitoring. Returns detailed training history and statistics for analysis.
- Parameters:
X_train (
NDArray[np.float64]) – Training input data of shape (N, input_dim).y_train (
NDArray[np.float64]) – Training targets of shape (N,) or (N, output_dim).X_val (
NDArray[np.float64], optional) – Validation input data. Defaults to None.y_val (
NDArray[np.float64], optional) – Validation targets. Defaults to None.epochs (
int, optional) – Number of training epochs. Defaults to 10.batch_size (
int, optional) – Mini-batch size. If None, uses full batch. Defaults to None.verbose (
bool, optional) – Whether to print training progress. Defaults to True.log_every (
int, optional) – Frequency of progress logging in epochs. Defaults to 1.early_stopping_patience (
int, optional) – Epochs to wait for improvement before stopping. Defaults to 50.lr_decay (
float, optional) – Learning rate decay factor per epoch. Defaults to None.numerical_check_freq (
int, optional) – Frequency of numerical stability checks. Defaults to 100.metric (
str, optional) – Evaluation metric for monitoring. Defaults to “smart”.reset_before_training (
bool, optional) – Whether to reset weights before training. Defaults to True.monitor (
TrainingMonitor, optional) – Real-time training monitor. Defaults to None.monitor_freq (
int, optional) – Monitoring frequency in epochs. Defaults to 1.
- Returns:
- Comprehensive training results containing:
weights: Final trained weight matrices
biases: Final trained bias vectors
history: Training/validation loss and metrics per epoch
activations: Sample activations from middle epoch
gradients: Sample gradients from middle epoch
weight_stats_over_epochs: Weight statistics evolution
activation_stats_over_epochs: Activation statistics evolution
gradient_stats_over_epochs: Gradient statistics evolution
- Return type:
- Raises:
ValueError – If model is not compiled or if input dimensions are incompatible.
Example
>>> history = model.fit(X_train, y_train, X_val, y_val, ... epochs=100, batch_size=32, ... early_stopping_patience=10) >>> print(f"Final training loss: {history['history']['train_loss'][-1]:.4f}")
- fit_fast(X_train, y_train, X_val=None, y_val=None, epochs=10, batch_size=32, verbose=True, log_every=1, early_stopping_patience=50, lr_decay=None, numerical_check_freq=100, metric='smart', reset_before_training=True, eval_freq=5)[source]¶
High-performance training method optimized for fast training.
Ultra-fast training loop that eliminates statistics collection overhead and monitoring bottlenecks. Provides ~5-10× speedup over standard fit() while maintaining identical API and training quality.
Key Performance Optimizations: - Eliminates expensive statistics collection (main bottleneck) - Uses optimized batch processing with array views - Streamlined training loop with only essential operations - Configurable evaluation frequency to reduce overhead
Expected Performance: - ~5-10× faster than fit() method - 60-80% less memory usage
- Parameters:
X_train (
NDArray[np.float64]) – Training input data of shape (N, input_dim).y_train (
NDArray[np.float64]) – Training targets of shape (N,) or (N, output_dim).X_val (
NDArray[np.float64], optional) – Validation input data. Defaults to None.y_val (
NDArray[np.float64], optional) – Validation targets. Defaults to None.epochs (
int, optional) – Number of training epochs. Defaults to 10.batch_size (
int, optional) – Mini-batch size. If None, uses full batch. Defaults to None.verbose (
bool, optional) – Whether to print training progress. Defaults to True.log_every (
int, optional) – Frequency of progress logging in epochs. Defaults to 1.early_stopping_patience (
int, optional) – Epochs to wait for improvement before stopping. Defaults to 50.lr_decay (
float, optional) – Learning rate decay factor per epoch. Defaults to None.numerical_check_freq (
int, optional) – Frequency of numerical stability checks. Defaults to 100.metric (
str, optional) – Evaluation metric for monitoring. Defaults to “smart”.reset_before_training (
bool, optional) – Whether to reset weights before training. Defaults to True.monitor (
TrainingMonitor, optional) – Real-time training monitor. Defaults to None.monitor_freq (
int, optional) – Monitoring frequency in epochs. Defaults to 1.eval_freq (
int, optional) – Evaluation frequency in epochs for performance. Defaults to 5.
- Returns:
- Streamlined training results containing:
weights: Final trained weight matrices
biases: Final trained bias vectors
history: Training/validation loss and metrics per epoch
performance_stats: Training time and speed metrics
- Return type:
- Raises:
ValueError – If model is not compiled or if input dimensions are incompatible.
Example
>>> # Ultra-fast training >>> history = model.fit_fast(X_train, y_train, X_val, y_val, ... epochs=100, batch_size=256, eval_freq=5)
Note
For research and debugging with full diagnostics, use the standard fit() method. This method prioritizes speed over detailed monitoring capabilities.
- fit_batch(X_batch, y_batch, epochs=10, verbose=True, metric='smart')[source]¶
Train on a single batch for specified epochs. Uses 2-8 samples of given batch. .. note:
The range (2-8) samples is based on PyTorch implementation and literature such as blog of Karpathy (A Recipe for Training Neural Networks), Universal Approximation Theorem (Hornik et al., 1989), Empirical Risk Minimization (Vapnik, 1998) and others.
- save(filepath: str, save_optimizer: bool = False, **metadata) None[source]¶
Save model to disk in NeuroScope format (.ns).
Saves model architecture, weights, and optionally optimizer state for resuming training. Uses pickle for efficient serialization.
- Parameters:
filepath – Path to save file (will add .ns extension if missing)
save_optimizer – If True, saves optimizer state for training resume
**metadata – Additional metadata to save (e.g., epoch, accuracy)
Examples
>>> # Basic save >>> model.save('my_model.ns')
>>> # Save with metadata >>> model.save('checkpoint.ns', epoch=50, accuracy=0.95)
>>> # Save without optimizer (inference only) >>> model.save('model.ns', save_optimizer=False)
Notes
File format: Python pickle (protocol 4)
Extension: .ns (NeuroScope)
Compatible with NumPy arrays
- classmethod load(filepath: str, load_optimizer: bool = False)[source]¶
Load model from disk. (.ns format) :param filepath: Path to saved model file (.ns) :param load_optimizer: If True, loads optimizer state for training resume
- Returns:
(model, info) where: - model: Loaded MLP instance - info: ModelInfo dict with nice __repr__ for printing
- Return type:
Examples
>>> # Load and inspect metadata >>> model, info = MLP.load('checkpoint.ns') >>> print(info) # Shows nice formatted summary >>> predictions = model.predict(X_test)
>>> # Load for continued training >>> model, info = MLP.load('checkpoint.ns', load_optimizer=True) >>> print(f"Resuming from epoch {info['custom_metadata']['epoch']}") >>> model.fit(X, y, epochs=50)
>>> # Access metadata programmatically >>> model, info = MLP.load('model.ns') >>> print(f"Architecture: {info['model_config']['layer_dims']}") >>> print(f"Saved at: {info['timestamp']}")
- Raises:
FileNotFoundError – If file doesn’t exist
ValueError – If file format is invalid
Activation Functions¶
Activation Functions Module A comprehensive collection of activation functions and their derivatives for neural networks.
- class neuroscope.mlp.activations.ActivationFunctions[source]¶
Bases:
objectComprehensive collection of activation functions and their derivatives.
Provides implementations of popular activation functions used in neural networks, including their derivatives for backpropagation. All functions are numerically stable and handle edge cases appropriately.
- static sigmoid(x)[source]¶
Compute sigmoid activation function.
Applies the logistic sigmoid function that maps input to (0, 1) range. Includes numerical clipping to prevent overflow in exponential computation.
- Parameters:
x (
NDArray[np.float64]) – Input array of any shape.- Returns:
Sigmoid-activated values in range (0, 1).
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.sigmoid(z) >>> # Values are now between 0 and 1
- static relu(x)[source]¶
Compute ReLU (Rectified Linear Unit) activation.
Applies the rectified linear activation function that outputs the input for positive values and zero for negative values. Most popular activation for hidden layers in modern neural networks.
- Parameters:
x (
NDArray[np.float64]) – Input array of any shape.- Returns:
ReLU-activated values (non-negative).
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.relu(z) >>> # Negative values become 0, positive values unchanged
- static leaky_relu(x, negative_slope=0.01)[source]¶
Compute Leaky ReLU activation function.
Variant of ReLU that allows small negative values to flow through, helping to mitigate the “dying ReLU” problem where neurons can become permanently inactive.
- Parameters:
x (
NDArray[np.float64]) – Input array of any shape.negative_slope (
float, optional) – Slope for negative values. Defaults to 0.01.
- Returns:
Leaky ReLU-activated values.
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.leaky_relu(z, negative_slope=0.01) >>> # Positive values unchanged, negative values scaled by 0.01
- static inverted_dropout_with_mask(x, rate=0.5, training=True)[source]¶
Inverted dropout that returns both output and mask for backpropagation.
- Parameters:
x – Input array
rate – Dropout probability (fraction of units to drop)
training – Whether in training mode
- Returns:
(output, mask) where mask includes the 1/(1-p) scaling
- Return type:
- static alpha_dropout_with_mask(x, rate=0.5, training=True)[source]¶
Alpha dropout that returns both output and mask for backpropagation.
Based on “Self-Normalizing Neural Networks” (Klambauer et al., 2017). Alpha dropout maintains the self-normalizing property of SELU activations.
- Parameters:
x – Input array
rate – Dropout probability (p in the paper)
training – Whether in training mode
- Returns:
- (output, mask_dict) where mask_dict contains the binary dropout
mask and affine transform parameters for backpropagation
- Return type:
Loss Functions¶
Loss Functions Module Collection of loss functions for different machine learning tasks.
- class neuroscope.mlp.losses.LossFunctions[source]¶
Bases:
objectCollection of loss functions for neural network training.
Provides implementations of common loss functions used in regression and classification tasks, with support for L2 regularization. All functions handle numerical stability and edge cases appropriately.
- static mse(y_true, y_pred)[source]¶
Compute mean squared error loss.
Standard regression loss function that penalizes squared differences between predictions and targets. Suitable for continuous target values.
- Parameters:
y_true (
NDArray[np.float64]) – Ground truth values of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]) – Predicted values of shape (N,) or (N, 1).
- Returns:
Mean squared error loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.mse(y_true, y_pred) >>> print(f"MSE Loss: {loss:.4f}")
- static bce(y_true, y_pred, eps=1e-12)[source]¶
Compute binary cross-entropy loss.
Standard loss function for binary classification problems. Applies numerical clipping to prevent log(0) errors and ensure stability.
- Parameters:
y_true (
NDArray[np.float64]) – Binary labels (0/1) of shape (N,).y_pred (
NDArray[np.float64]) – Predicted probabilities of shape (N,).eps (
float, optional) – Small value for numerical stability. Defaults to 1e-12.
- Returns:
Binary cross-entropy loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.bce(y_true, y_pred) >>> print(f"BCE Loss: {loss:.4f}")
- static cce(y_true, y_pred, eps=1e-12)[source]¶
Compute categorical cross-entropy loss.
Standard loss function for multi-class classification. Handles both sparse labels (class indices) and one-hot encoded targets.
- Parameters:
y_true (
NDArray[np.float64]) – Class labels of shape (N,) for sparse labels or (N, C) for one-hot encoded targets.y_pred (
NDArray[np.float64]) – Predicted class probabilities of shape (N, C).eps (
float, optional) – Small value for numerical stability. Defaults to 1e-12.
- Returns:
Categorical cross-entropy loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.cce(y_true, y_pred) >>> print(f"CCE Loss: {loss:.4f}")
Metrics¶
Metrics Module Comprehensive evaluation metrics for regression and classification tasks.
- class neuroscope.mlp.metrics.Metrics[source]¶
Bases:
objectComprehensive collection of evaluation metrics for neural networks.
Provides implementations of standard metrics for both regression and classification tasks. All metrics handle edge cases and provide meaningful results for model evaluation.
- static accuracy_multiclass(y_true, y_pred)[source]¶
Compute multi-class classification accuracy.
Calculates the fraction of correctly predicted samples for multi-class classification problems. Handles both sparse labels and one-hot encoded inputs.
- Parameters:
y_true (
NDArray[np.float64]) – True class labels of shape (N,) for sparse labels or (N, C) for one-hot encoded.y_pred (
NDArray[np.float64]) – Predicted class probabilities of shape (N, C).
- Returns:
Classification accuracy as a fraction (0.0 to 1.0).
- Return type:
Example
>>> accuracy = Metrics.accuracy_multiclass(y_true, y_pred) >>> print(f"Accuracy: {accuracy:.2%}")
- static accuracy_binary(y_true, y_pred, thresh=0.5)[source]¶
Compute binary classification accuracy.
Calculates the fraction of correctly predicted samples for binary classification by applying a threshold to predicted probabilities.
- Parameters:
y_true (
NDArray[np.float64]) – Binary labels (0/1) of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]) – Predicted probabilities of shape (N,) or (N, 1).thresh (
float, optional) – Classification threshold. Defaults to 0.5.
- Returns:
Binary classification accuracy as a fraction (0.0 to 1.0).
- Return type:
Example
>>> accuracy = Metrics.accuracy_binary(y_true, y_pred, thresh=0.5) >>> print(f"Binary Accuracy: {accuracy:.2%}")
- static mse(y_true, y_pred)[source]¶
Compute mean squared error metric.
Calculates the average squared differences between predicted and true values. Commonly used metric for regression problems.
- Parameters:
y_true (
NDArray[np.float64]) – Ground truth values of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]) – Predicted values of shape (N,) or (N, 1).
- Returns:
Mean squared error (scalar).
- Return type:
Example
>>> mse_score = Metrics.mse(y_true, y_pred) >>> print(f"MSE: {mse_score:.4f}")
- static r2_score(y_true, y_pred)[source]¶
Compute coefficient of determination (R² score).
Measures the proportion of variance in the dependent variable that is predictable from the independent variables. R² = 1 indicates perfect fit, R² = 0 indicates the model performs as well as predicting the mean.
- Parameters:
y_true (
NDArray[np.float64]) – Ground truth values of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]) – Predicted values of shape (N,) or (N, 1).
- Returns:
R² score (can be negative for very poor fits).
- Return type:
Example
>>> r2 = Metrics.r2_score(y_true, y_pred) >>> print(f"R² Score: {r2:.3f}")
- static precision(y_true, y_pred, average='weighted', threshold=0.5)[source]¶
Compute precision score: TP / (TP + FP)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
- static recall(y_true, y_pred, average='weighted', threshold=0.5)[source]¶
Compute recall score: TP / (TP + FN)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
- static f1_score(y_true, y_pred, average='weighted', threshold=0.5)[source]¶
Compute F1 score: 2 * (Precision * Recall) / (Precision + Recall)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
Weight Initializers¶
Weight Initialization Module Professional weight initialization strategies for neural networks.
- class neuroscope.mlp.initializers.WeightInits[source]¶
Bases:
objectResearch-validated weight initialization strategies for neural networks.
Provides implementations of modern weight initialization methods that help maintain proper gradient flow and accelerate training convergence. All methods follow established theoretical foundations from deep learning research.
- static he_init(layer_dims: list, seed=42)[source]¶
He initialization for ReLU and ReLU-variant activations.
Optimal for ReLU-based networks as derived in He et al. (2015). Uses standard deviation of sqrt(2/fan_in) to maintain proper variance propagation through ReLU activations.
- Parameters:
layer_dims (
list[int]) – Layer dimensions [input_dim, hidden_dim, …, output_dim].seed (
int, optional) – Random seed for reproducibility. Defaults to 42.
- Returns:
- (weights, biases) where weights are initialized
according to He initialization and biases are zero-initialized.
- Return type:
Note
Based on “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification” (He et al. 2015).
Example
>>> weights, biases = WeightInits.he_init([784, 128, 10])
- static xavier_init(layer_dims: list, seed=42)[source]¶
Xavier/Glorot initialization for sigmoid and tanh activations.
Optimal for symmetric activations like tanh and sigmoid. Uses standard deviation of sqrt(2/(fan_in + fan_out)) to maintain constant variance across layers.
- Parameters:
layer_dims (
list[int]) – Layer dimensions [input_dim, hidden_dim, …, output_dim].seed (
int, optional) – Random seed for reproducibility. Defaults to 42.
- Returns:
(weights, biases) with Xavier-initialized weights and zero biases.
- Return type:
Note
Based on “Understanding the difficulty of training deep feedforward neural networks” (Glorot & Bengio 2010).
Example
>>> weights, biases = WeightInits.xavier_init([784, 128, 10])
- static smart_init(layer_dims: list, hidden_activation='leaky_relu', seed=42)[source]¶
Intelligent initialization selection based on activation function.
Automatically selects the optimal initialization strategy based on the chosen activation function. Combines research-validated best practices to ensure proper gradient flow from the start of training.
- Parameters:
- Returns:
(weights, biases) with optimally initialized weights for the activation.
- Return type:
- Initialization Strategy:
ReLU/Leaky ReLU: He initialization
Tanh/Sigmoid: Xavier initialization
SELU: LeCun initialization
Unknown: Xavier initialization (safe default)
Example
>>> weights, biases = WeightInits.smart_init([784, 128, 10], 'relu')
Optimizers¶
Optimizers for NeuroScope MLP.
- class neuroscope.mlp.optimizers.Optimizer(learning_rate: float = 0.01)[source]¶
Bases:
ABCBase class for all optimizers.
Provides common interface for parameter updates and state management.
Initialize optimizer.
- Parameters:
learning_rate – Step size for parameter updates
- __init__(learning_rate: float = 0.01)[source]¶
Initialize optimizer.
- Parameters:
learning_rate – Step size for parameter updates
- abstractmethod update(weights: List[ndarray], biases: List[ndarray], weight_grads: List[ndarray], bias_grads: List[ndarray]) None[source]¶
Update parameters using gradients.
- Parameters:
weights – List of weight matrices
biases – List of bias vectors
weight_grads – Gradients for weights
bias_grads – Gradients for biases
- class neuroscope.mlp.optimizers.SGD(learning_rate: float = 0.01)[source]¶
Bases:
OptimizerStochastic Gradient Descent optimizer.
- Implements basic gradient descent with fixed learning rate:
θ_t = θ_{t-1} - α * ∇L(θ_{t-1})
- Parameters:
learning_rate – Learning rate (step size), default: 0.01
References
Robbins & Monro (1951). “A Stochastic Approximation Method.” Annals of Mathematical Statistics.
Example
>>> from neuroscope import MLP >>> model = MLP([10, 20, 5]) >>> model.compile(optimizer="sgd", lr=0.01) >>> history = model.fit(X_train, y_train, epochs=100)
Initialize optimizer.
- Parameters:
learning_rate – Step size for parameter updates
- class neuroscope.mlp.optimizers.SGDMomentum(learning_rate: float = 0.01, momentum: float = 0.9, nesterov: bool = False)[source]¶
Bases:
OptimizerSGD with Momentum optimizer.
Implements momentum-accelerated gradient descent. Momentum accumulates gradients over time, allowing faster convergence and reduced oscillation.
- Standard Momentum (Polyak, 1964):
v_t = μ * v_{t-1} + ∇L(θ_{t-1}) θ_t = θ_{t-1} - α * v_t
- Nesterov Momentum (Nesterov, 1983):
v_t = μ * v_{t-1} + ∇L(θ_{t-1}) θ_t = θ_{t-1} - α * (μ * v_t + ∇L(θ_{t-1}))
- Parameters:
learning_rate – Learning rate (step size), default: 0.01
momentum – Momentum coefficient μ ∈ [0, 1), default: 0.9
nesterov – Enable Nesterov accelerated gradient, default: False
References
Polyak, B. T. (1964). “Some methods of speeding up the convergence of iteration methods.” USSR Computational Mathematics and Mathematical Physics, 4(5), 1-17.
Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). “On the importance of initialization and momentum in deep learning.” ICML 2013.
Nesterov, Y. (1983). “A method for unconstrained convex minimization problem with the rate of convergence O(1/k^2).” Doklady AN USSR, 269.
Example
>>> from neuroscope import MLP >>> model = MLP([784, 128, 64, 10]) >>> # Standard momentum >>> model.compile(optimizer="sgdm", lr=0.01) >>> history = model.fit(X_train, y_train, epochs=100)
>>> # Nesterov momentum (recommended for deep networks) >>> model.compile(optimizer="sgdnm", lr=0.01) >>> history = model.fit(X_train, y_train, epochs=100)
Notes
Typical momentum values: 0.9 (default) or 0.95 (aggressive)
Nesterov momentum often converges faster than standard momentum
Momentum helps escape local minima and traverse flat regions
Initialize optimizer.
- Parameters:
learning_rate – Step size for parameter updates
- __init__(learning_rate: float = 0.01, momentum: float = 0.9, nesterov: bool = False)[source]¶
Initialize optimizer.
- Parameters:
learning_rate – Step size for parameter updates
- class neuroscope.mlp.optimizers.Adam(learning_rate: float = 0.001, beta1: float = 0.9, beta2: float = 0.999, eps: float = 1e-08)[source]¶
Bases:
OptimizerAdam (Adaptive Moment Estimation) optimizer.
Combines momentum with adaptive learning rates. Maintains exponential moving averages of gradients (first moment) and squared gradients (second moment). Includes bias correction to account for initialization at zero.
- Algorithm (Kingma & Ba, 2014):
m_t = β₁ * m_{t-1} + (1 - β₁) * g_t [First moment estimate] v_t = β₂ * v_{t-1} + (1 - β₂) * g_t² [Second moment estimate] m̂_t = m_t / (1 - β₁ᵗ) [Bias-corrected first moment] v̂_t = v_t / (1 - β₂ᵗ) [Bias-corrected second moment] θ_t = θ_{t-1} - α * m̂_t / (√v̂_t + ε) [Parameter update]
- Parameters:
learning_rate – Learning rate α, default: 0.001
beta1 – First moment decay rate β₁ ∈ [0, 1), default: 0.9
beta2 – Second moment decay rate β₂ ∈ [0, 1), default: 0.999
eps – Numerical stability constant ε, default: 1e-8
References
Kingma, D. P., & Ba, J. (2014). “Adam: A Method for Stochastic Optimization.” arXiv preprint arXiv:1412.6980.
Example
>>> from neuroscope import MLP >>> model = MLP([784, 128, 64, 10]) >>> # Standard Adam (recommended default) >>> model.compile(optimizer="adam", lr=0.001) >>> history = model.fit(X_train, y_train, epochs=100)
>>> # Higher learning rate for faster convergence >>> model.compile(optimizer="adam", lr=0.01) >>> history = model.fit(X_train, y_train, epochs=100)
Notes
Default hyperparameters work well for most problems
Adam is particularly effective for sparse gradients and noisy data
Less sensitive to learning rate than SGD
Memory overhead: 2x parameters (stores m and v)
Initialize optimizer.
- Parameters:
learning_rate – Step size for parameter updates
- __init__(learning_rate: float = 0.001, beta1: float = 0.9, beta2: float = 0.999, eps: float = 1e-08)[source]¶
Initialize optimizer.
- Parameters:
learning_rate – Step size for parameter updates
- class neuroscope.mlp.optimizers.RMSprop(learning_rate: float = 0.001, rho: float = 0.9, eps: float = 1e-08, momentum: float = 0.0)[source]¶
Bases:
OptimizerRMSprop (Root Mean Square Propagation) optimizer.
Maintains moving average of squared gradients to normalize learning rates. Particularly effective for non-stationary objectives and recurrent networks. Can be seen as a precursor to Adam, using only second moment adaptation.
- Algorithm (Hinton, 2012; Tieleman & Hinton, 2012):
E[g²]_t = ρ * E[g²]_{t-1} + (1 - ρ) * g_t² [Moving average of squared gradients] θ_t = θ_{t-1} - α * g_t / (√E[g²]_t + ε) [Parameter update]
- With momentum (optional):
v_t = μ * v_{t-1} + α * g_t / (√E[g²]_t + ε) [Momentum accumulation] θ_t = θ_{t-1} - v_t [Parameter update]
- Parameters:
learning_rate – Learning rate α, default: 0.001
rho – Decay rate for moving average ρ ∈ [0, 1), default: 0.9
eps – Numerical stability constant ε, default: 1e-8
momentum – Momentum coefficient μ ∈ [0, 1), default: 0.0 (disabled)
References
Tieleman, T., & Hinton, G. (2012). “Lecture 6.5 - RMSprop: Divide the gradient by a running average of its recent magnitude.” COURSERA: Neural Networks for Machine Learning.
Hinton, G., Srivastava, N., & Swersky, K. (2012). “Neural Networks for Machine Learning Lecture 6a Overview of mini-batch gradient descent.”
Example
>>> from neuroscope import MLP >>> model = MLP([784, 128, 64, 10]) >>> # Standard RMSprop (recommended for RNNs) >>> model.compile(optimizer="rmsprop", lr=0.001) >>> history = model.fit(X_train, y_train, epochs=100)
>>> # Note: RMSprop uses built-in momentum=0.0 by default >>> # For momentum-based training, use "sgdm" or "sgdnm" instead
Notes
Default rho=0.9 works well for most problems
RMSprop handles sparse gradients better than standard SGD
Adding momentum can improve convergence stability
Less memory intensive than Adam (no first moment)
Initialize optimizer.
- Parameters:
learning_rate – Step size for parameter updates
- __init__(learning_rate: float = 0.001, rho: float = 0.9, eps: float = 1e-08, momentum: float = 0.0)[source]¶
Initialize optimizer.
- Parameters:
learning_rate – Step size for parameter updates
- update(weights: List[ndarray], biases: List[ndarray], weight_grads: List[ndarray], bias_grads: List[ndarray]) None[source]¶
Apply RMSprop adaptive gradient update.
Implements the RMSprop algorithm from Tieleman & Hinton (2012) with optional momentum acceleration.
Utilities¶
Utilities Module Helper functions for training, validation, and data processing.
- class neuroscope.mlp.utils.Utils[source]¶
Bases:
objectUtility functions for neural network training and data processing.
Provides essential helper functions for batch processing, gradient clipping, input validation, and numerical stability checks. All methods are static and can be used independently throughout the framework.
- static get_batches(X, y, batch_size=32, shuffle=True)[source]¶
Generate mini-batches for training.
Creates mini-batches from input data with optional shuffling for stochastic gradient descent training. Handles the last batch even if it contains fewer samples than batch_size.
- Parameters:
- Yields:
tuple[NDArray,NDArray]– (X_batch, y_batch) for each mini-batch.
Example
>>> for X_batch, y_batch in Utils.get_batches(X_train, y_train, batch_size=64): ... # Process batch ... pass
- static get_batches_fast(X, y, batch_size=32, shuffle=True)[source]¶
Generate mini-batches for training with optimized memory usage. Expected to be 2-5x faster than get_batches() for large datasets.
- Parameters:
- Yields:
tuple[NDArray,NDArray]– (X_batch, y_batch) for each mini-batch.
Note
Uses array views (slicing) instead of fancy indexing to avoid memory allocation. Pre-reshapes y to avoid repeated reshaping in training loop.
Example
>>> # Fast batch preprocessing for fast training >>> for X_batch, y_batch in Utils.get_batches_fast(X_train, y_train, batch_size=64): ... # Process batch ... pass
- static gradient_clipping(gradients, max_norm=5.0)[source]¶
Apply gradient clipping to prevent exploding gradients.
Clips gradients by global norm as described in Pascanu et al. (2013). If the global norm exceeds max_norm, all gradients are scaled down proportionally to maintain their relative magnitudes.
- Parameters:
gradients (
list[NDArray[np.float64]]) – List of gradient arrays.max_norm (
float, optional) – Maximum allowed gradient norm. Defaults to 5.0.
- Returns:
Clipped gradient arrays.
- Return type:
list[NDArray[np.float64]]
Note
Based on “On the difficulty of training recurrent neural networks” (Pascanu et al. 2013) for gradient norm clipping.
Example
>>> clipped_grads = Utils.gradient_clipping(gradients, max_norm=5.0)
- static validate_array_input(arr, name, min_dims=1, max_dims=3, fast_mode=False)[source]¶
Optimized validation for neural network operations.
Performs efficient validation with optional fast mode for training. Automatically converts compatible inputs to numpy arrays when possible.
- Parameters:
arr – Input array or array-like object to validate.
name (
str) – Name of the array for error messages.min_dims (
int, optional) – Minimum allowed dimensions. Defaults to 1.max_dims (
int, optional) – Maximum allowed dimensions. Defaults to 3.fast_mode (
bool, optional) – Skip expensive NaN/inf checks for speed. Defaults to False.
- Returns:
Validated numpy array.
- Return type:
NDArray[np.float64]
- Raises:
TypeError – If input cannot be converted to numpy array.
ValueError – If dimensions, shape, or values are invalid.
Example
>>> X_valid = Utils.validate_array_input(X, "training_data", min_dims=2, max_dims=2) >>> X_fast = Utils.validate_array_input(X, "X_train", fast_mode=True) # For fit_fast()
- static check_numerical_stability(arrays, context='computation', fast_mode=False)[source]¶
Simple numerical stability check with user-friendly warnings.
Provides clear, actionable warnings for common training issues. Fast mode only checks for critical problems for performance.
- Parameters:
- Returns:
List of simple, actionable issue descriptions.
- Return type:
Example
>>> issues = Utils.check_numerical_stability(activations, "forward_pass") >>> if issues: ... print(f"Training Issue: {issues[0]}")
Diagnostics Module¶
Pre-Training Analysis¶
Pre-Training Analysis for NeuroScope MLP Framework Focused pre-training analysis tools for neural network assessment before training.
- class neuroscope.diagnostics.pretraining.PreTrainingAnalyzer(model)[source]¶
Bases:
objectComprehensive pre-training diagnostic tool for neural networks.
Analyzes model architecture, weight initialization, and data compatibility before training begins. Implements research-validated checks to identify potential training issues early, based on established deep learning principles from Glorot & Bengio (2010), He et al. (2015), and others.
- Parameters:
model – Compiled MLP model instance with initialized weights.
- model¶
Reference to the neural network model.
Example
>>> from neuroscope.diagnostics import PreTrainingAnalyzer >>> model = MLP([784, 128, 10]) >>> model.compile(lr=1e-3) >>> analyzer = PreTrainingAnalyzer(model) >>> results = analyzer.analyze(X_train, y_train)
Initialize analyzer with a compiled model.
- analyze_initial_loss(X: ndarray, y: ndarray) Dict[str, Any][source]¶
Validate initial loss against theoretical expectations.
Compares the model’s initial loss (before training) with theoretical baselines for different task types. For classification, expects loss near -log(1/num_classes). For regression, compares against variance-based baseline as described in Goodfellow et al. (2016).
- Parameters:
X (
NDArray[np.float64]) – Input data of shape (N, input_dim).y (
NDArray[np.float64]) – Target values of shape (N,) or (N, output_dim).
- Returns:
- Analysis results containing:
initial_loss: Computed initial loss value
expected_loss: Theoretical expected loss
ratio: initial_loss / expected_loss
task_type: Detected task type (regression/classification)
status: “PASS”, “WARN”, or “FAIL”
note: Diagnostic message
- Return type:
Example
>>> results = analyzer.analyze_initial_loss(X_train, y_train) >>> print(f"Initial loss check: {results['status']}")
- analyze_weight_init() Dict[str, Any][source]¶
Validate weight initialization against theoretical optima.
Analyzes weight initialization quality by comparing actual weight standard deviations against theoretically optimal values for different activation functions. Based on He initialization (He et al. 2015) for ReLU variants and Xavier initialization (Glorot & Bengio 2010) for sigmoid/tanh.
- Returns:
- Analysis results containing:
layers: List of per-layer initialization analysis
status: Overall initialization quality (“PASS”, “WARN”, “FAIL”)
note: Summary diagnostic message
- Return type:
Example
>>> results = analyzer.analyze_weight_init() >>> for layer in results['layers']: ... print(f"Layer {layer['layer']}: {layer['status']}")
- analyze_layer_capacity() Dict[str, Any][source]¶
Analyze information bottlenecks and layer capacity issues.
- analyze_architecture_sanity() Dict[str, Any][source]¶
Perform comprehensive architecture validation.
Validates network architecture against established deep learning principles and best practices. Checks for common architectural pitfalls such as incompatible activation functions, inappropriate depth, and problematic layer configurations based on research findings.
- Returns:
- Analysis results containing:
issues: List of critical architectural problems
warnings: List of potential concerns
status: Overall architecture quality (“PASS”, “WARN”, “FAIL”)
note: Summary diagnostic message
- Return type:
Note
Based on research from Bengio et al. (2009) on vanishing gradients, modern best practices for deep architectures, and activation function compatibility studies.
Example
>>> results = analyzer.analyze_architecture_sanity() >>> if results['issues']: ... print("Critical issues found:", results['issues'])
- analyze_capacity_data_ratio(X: ndarray, y: ndarray) Dict[str, Any][source]¶
Analyze parameter count relative to training data size.
Training Monitoring¶
Training Monitors for NeuroScope MLP Framework Real-time monitoring tools for neural network training based on modern deep learning research. Implements comprehensive training diagnostics with emoji-based status indicators.
- class neuroscope.diagnostics.training_monitors.TrainingMonitor(model=None, history_size=50)[source]¶
Bases:
objectComprehensive real-time training monitoring system for neural networks.
Monitors 10 key training health indicators: - Dead ReLU neurons detection - Vanishing Gradient Problem (VGP) detection - Exploding Gradient Problem (EGP) detection - Weight health analysis - Learning progress - Overfitting detection - Gradient signal-to-noise ratio - Activation saturation detection (tanh/sigmoid) - Training plateau detection - Weight update vs magnitude ratios
Initialize comprehensive training monitor.
Sets up monitoring infrastructure for tracking 10 key training health indicators during neural network training. Uses research-validated thresholds and emoji-based status visualization.
- Parameters:
model – Optional MLP model instance (can be set later).
history_size (
int, optional) – Number of epochs to keep in rolling history for trend analysis. Defaults to 50.
Example
>>> monitor = TrainingMonitor(history_size=100) >>> results = model.fit(X, y, monitor=monitor)
- __init__(model=None, history_size=50)[source]¶
Initialize comprehensive training monitor.
Sets up monitoring infrastructure for tracking 10 key training health indicators during neural network training. Uses research-validated thresholds and emoji-based status visualization.
- Parameters:
model – Optional MLP model instance (can be set later).
history_size (
int, optional) – Number of epochs to keep in rolling history for trend analysis. Defaults to 50.
Example
>>> monitor = TrainingMonitor(history_size=100) >>> results = model.fit(X, y, monitor=monitor)
- monitor_relu_dead_neurons(activations: List[ndarray], activation_functions: List[str] | None = None) Tuple[float, str][source]¶
Monitor for dead ReLU neurons during training.
Detects neurons that have become inactive (always output zero) which indicates the “dying ReLU” problem. Uses activation-function-aware thresholds based on research by Glorot et al. (2011) and He et al. (2015).
Natural sparsity in ReLU networks is expected (~50%), but excessive sparsity (>90%) indicates dead neurons that cannot learn.
- Parameters:
activations (
list[NDArray[np.float64]]) – Layer activation outputs.activation_functions (
list[str], optional) – Activation function names per layer.
- Returns:
- (dead_percentage, status_emoji) where status is:
🟢: Healthy sparsity (<10% dead)
🟡: Moderate concern (10-30% dead)
🔴: Critical issue (>30% dead)
- Return type:
Note
Based on “Deep Sparse Rectifier Neural Networks” (Glorot et al. 2011) and “Delving Deep into Rectifiers” (He et al. 2015).
- monitor_vanishing_gradients(gradients: List[ndarray]) Tuple[float, str][source]¶
Detect vanishing gradient problem using research-validated metrics.
Monitors gradient flow through the network to detect vanishing gradients based on variance analysis from Glorot & Bengio (2010). Healthy networks maintain similar gradient variance across layers.
- Parameters:
gradients (
list[NDArray[np.float64]]) – Gradient arrays for each layer.- Returns:
- (vgp_severity, status_emoji) where:
vgp_severity: Float in [0,1] indicating severity
status: 🟢 (healthy), 🟡 (warning), 🔴 (critical)
- Return type:
Note
Implementation based on “Understanding the difficulty of training deep feedforward neural networks” (Glorot & Bengio 2010).
- monitor_exploding_gradients(gradients: List[ndarray]) Tuple[float, str][source]¶
Detect exploding gradient problem using gradient norm analysis.
Monitors gradient magnitudes to detect exploding gradients based on research by Pascanu et al. (2013). Uses both global norm and per-layer analysis to identify unstable training dynamics.
- Parameters:
gradients (
list[NDArray[np.float64]]) – Gradient arrays for each layer.- Returns:
- (egp_severity, status_emoji) where:
egp_severity: Float in [0,1] indicating severity
status: 🟢 (stable), 🟡 (elevated), 🔴 (exploding)
- Return type:
Note
Based on “On the difficulty of training recurrent neural networks” (Pascanu et al. 2013) gradient clipping and norm analysis.
- monitor_weight_health(weights: List[ndarray]) Tuple[float, str][source]¶
Simple, research-backed weight health monitor. Based on Glorot & Bengio (2010) and He et al. (2015) initialization theory. :param weights: List of weight matrices
- Returns:
Tuple of (health_score, status)
- monitor_learning_progress(current_loss: float, val_loss: float | None = None) Tuple[float, str][source]¶
Research-accurate learning progress monitor. Based on optimization literature: Bottou (2010), Goodfellow et al. (2016), Smith (2017). Key insights: - Progress = consistent loss reduction + convergence stability + generalization health - Uses exponential moving averages and plateau detection from literature :param current_loss: Current training loss :param val_loss: Optional validation loss
- Returns:
Tuple of (progress_score, emoji_status)
- monitor_overfitting(train_loss: float, val_loss: float | None = None) Tuple[float, str][source]¶
Research-accurate overfitting detection. Based on Prechelt (1998), Goodfellow et al. (2016), and Caruana et al. (2001). Key insights: - Overfitting = increasing generalization gap + validation curve deterioration :param train_loss: Training loss :param val_loss: Validation loss
- Returns:
Tuple of (overfitting_score, emoji_status)
- monitor_gradient_snr(gradients: List[ndarray]) Tuple[float, str][source]¶
Calculate Gradient Signal-to-Noise Ratio (GSNR) for optimization health. - Signal: RMS gradient magnitude (update strength) - Noise: Coefficient of variation (relative inconsistency) - GSNR = RMS_magnitude / (std_magnitude + ε) This measures gradient update consistency. :param gradients: List of gradient arrays from each layer
- Returns:
Tuple of (gsnr_score, emoji_status)
- monitor_activation_saturation(activations: List[ndarray], activation_functions: List[str] = None) Tuple[float, str][source]¶
Research-accurate activation saturation detection. Based on Glorot & Bengio (2010), Hochreiter (1991), and He et al. (2015). Key insights: - Saturation = extreme activation values + poor gradient flow + skewed distributions - Uses function-specific thresholds and statistical distribution analysis - Tracks saturation propagation through network layers :param activations: List of activation arrays from each layer :param activation_functions: List of activation function names for each layer
- Returns:
Tuple of (saturation_score, emoji_status)
- monitor_plateau(current_loss: float, val_loss: float | None = None, current_gradients: List[ndarray] | None = None) Tuple[float, str][source]¶
Research-accurate training plateau detection. Based on Prechelt (1998), Bengio (2012), and Smith (2017). Key insights: - Plateau = statistical stagnation + loss of learning momentum + gradient analysis - Uses multi-scale analysis and statistical significance testing - Integrates validation correlation and gradient magnitude trends :param current_loss: Current training loss :param val_loss: Optional validation loss for correlation analysis :param current_gradients: Optional gradient arrays for gradient-based detection
- Returns:
Tuple of (plateau_score, emoji_status)
- monitor_weight_update_ratio(weights: List[ndarray], weight_updates: List[ndarray]) Tuple[float, str][source]¶
Monitor Weight Update to Weight magnitude Ratios (WUR) for learning rate validation. Research-based implementation using: - Smith (2015): Learning rate should produce WUR ~10^-3 to 10^-2 for stable training - Zeiler (2012): Update magnitude should be proportional to weight magnitude Formula: WUR = ||weight_update|| / ||weight|| per layer :param weights: Current weight matrices :param weight_updates: Weight update matrices (gradients * learning_rate)
- Returns:
Tuple of (median_wur, status)
- monitor_step(epoch: int, train_loss: float, val_loss: float | None = None, activations: List[ndarray] | None = None, gradients: List[ndarray] | None = None, weights: List[ndarray] | None = None, weight_updates: List[ndarray] | None = None, activation_functions: List[str] | None = None) Dict[str, Any][source]¶
Perform one monitoring step and return all metrics. :param epoch: Current epoch number :param train_loss: Training loss :param val_loss: Validation loss (optional) :param activations: Layer activations (optional) :param gradients: Layer gradients (optional) :param weights: Layer weights (optional) :param weight_updates: Weight updates (optional) :param activation_functions: List of activation function names (optional)
- Returns:
Dictionary containing all monitoring results
Post-Training Evaluation¶
Post-Training Evaluation for NeuroScope MLP Framework Focused post-training evaluation tools for neural network assessment after training.
- class neuroscope.diagnostics.posttraining.PostTrainingEvaluator(model)[source]¶
Bases:
objectComprehensive post-training evaluation system for neural networks.
Provides thorough analysis of trained model performance including robustness testing, performance metrics evaluation, and diagnostic assessments. Designed to validate model quality and identify potential deployment issues after training completion.
- Parameters:
model – Trained and compiled MLP model instance with initialized weights.
- model¶
Reference to the trained neural network model.
Example
>>> from neuroscope.diagnostics import PostTrainingEvaluator >>> model = MLP([784, 128, 10]) >>> model.compile(lr=1e-3) >>> history = model.fit(X_train, y_train, epochs=100) >>> evaluator = PostTrainingEvaluator(model) >>> evaluator.evaluate(X_test, y_test) >>> # Access detailed results >>> robustness = evaluator.evaluate_robustness(X_test, y_test) >>> performance = evaluator.evaluate_performance(X_test, y_test)
Initialize evaluator with a trained model.
- evaluate_robustness(X: ndarray, y: ndarray, noise_levels: List[float] = None) Dict[str, Any][source]¶
Evaluate model robustness against Gaussian noise.
- evaluate_performance(X: ndarray, y: ndarray) Dict[str, Any][source]¶
Evaluate model performance metrics.
Visualization Module¶
Plotting Tools¶
NeuroScope Visualization Module High-quality plotting tools for neural network training analysis.
- class neuroscope.viz.plots.Visualizer(hist)[source]¶
Bases:
objectHigh quality visualization tool for neural network training analysis.
Provides comprehensive plotting capabilities for analyzing training dynamics, network behavior, and diagnostic information. Creates professional-grade figures suitable for research publications and presentations.
- Parameters:
hist (
dict) – Complete training history from model.fit() containing: - history: Training/validation metrics per epoch - weights/biases: Final network parameters - activations/gradients: Sample network internals - *_stats_over_epochs: Statistical evolution during training
- weights/biases
Final network parameters.
- activations/gradients
Representative network internals.
Example
>>> from neuroscope.viz import Visualizer >>> history = model.fit(X_train, y_train, epochs=100) >>> viz = Visualizer(history) >>> viz.plot_learning_curves() >>> viz.plot_activation_distribution() >>> viz.plot_gradient_flow()
Initialize visualizer with comprehensive training history.
Sets up visualization infrastructure and applies publication-quality styling to all plots. Automatically extracts relevant data components for different types of analysis.
- Parameters:
hist (
dict) – Training history from model.fit() containing all training statistics, network states, and diagnostic information.
- __init__(hist)[source]¶
Initialize visualizer with comprehensive training history.
Sets up visualization infrastructure and applies publication-quality styling to all plots. Automatically extracts relevant data components for different types of analysis.
- Parameters:
hist (
dict) – Training history from model.fit() containing all training statistics, network states, and diagnostic information.
- plot_learning_curves(figsize=(9, 4), ci=False, markers=True, save_path=None, metric='accuracy')[source]¶
Plot training and validation learning curves for regular fit() results.
Creates highest quality subplot showing loss and metric evolution during training. Automatically detects available data and applies consistent styling with optional confidence intervals.
Note: For fit_fast() results, use plot_curves_fast() instead.
- Parameters:
figsize (
tuple[int,int], optional) – Figure dimensions (width, height). Defaults to (9, 4).ci (
bool, optional) – Whether to add confidence intervals using moving window statistics. Only available for regular fit() results. Defaults to False.markers (
bool, optional) – Whether to show markers on line plots. Defaults to True.save_path (
str, optional) – Path to save the figure. Defaults to None.metric (
str, optional) – Name of the metric for y-axis label. Defaults to ‘accuracy’.
Example
>>> viz.plot_learning_curves(figsize=(10, 5), ci=True, save_path='curves.png')
- plot_curves_fast(figsize=(10, 4), markers=True, save_path=None)[source]¶
Plot learning curves for fit_fast() results.
- plot_activation_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Plot activation value distributions across network layers.
Visualizes the distribution of activation values for each layer at a specific epoch, aggregated from all mini-batches. Useful for detecting activation saturation, dead neurons, and distribution shifts during training.
- Parameters:
epoch (
int, optional) – Specific epoch to plot. If None, uses last epoch. Defaults to None.figsize (
tuple[int,int], optional) – Figure dimensions. Defaults to (9, 4).kde (
bool, optional) – Whether to use KDE-style smoothing for smoother curves. Defaults to False.last_layer (
bool, optional) – Whether to include output layer. Defaults to False.save_path (
str, optional) – Path to save the figure. Defaults to None.
Example
>>> viz.plot_activation_hist(epoch=50, kde=True, save_path='activations.png')
- plot_gradient_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Plot gradient value distributions across network layers.
Visualizes gradient distributions to detect vanishing/exploding gradient problems, gradient flow issues, and training stability. Shows zero-line reference for assessing gradient symmetry and magnitude.
- Parameters:
epoch (
int, optional) – Specific epoch to plot. If None, uses last epoch. Defaults to None.figsize (
tuple[int,int], optional) – Figure dimensions. Defaults to (9, 4).kde (
bool, optional) – Whether to use KDE-style smoothing. Defaults to False.last_layer (
bool, optional) – Whether to include output layer gradients. Defaults to False.save_path (
str, optional) – Path to save the figure. Defaults to None.
Note
Gradient distributions should be roughly symmetric around zero for healthy training. Very narrow distributions may indicate vanishing gradients, while very wide distributions may indicate exploding gradients.
Example
>>> viz.plot_gradient_hist(epoch=25, kde=True, save_path='gradients.png')
- plot_weight_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Uses aggregated samples from all mini-batches within an epoch to create representative distributions. Shows weight evolution patterns.
- Parameters:
epoch – Specific epoch to plot (default: last epoch)
figsize – Figure size tuple
kde – Whether to use KDE-style smoothing
last_layer – Whether to include output layer (default: False, hidden layers only)
save_path – Path to save figure
- plot_activation_stats(activation_stats=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot activation statistics over time with both mean and std.
- Parameters:
activation_stats – Dict of layer activation stats OR None to use class data Format: {‘layer_0’: {‘mean’: […], ‘std’: […]}, …}
figsize – Figure size tuple
save_path – Path to save figure
- plot_gradient_stats(figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot gradient statistics over time with both mean and std.
- plot_weight_stats(figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot weight statistics over time with both mean and std.
- plot_update_ratios(update_ratios=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot weight update ratios per layer across epochs.
- Parameters:
update_ratios – Dict of layer update ratios (optional - uses collected data if None) Format: {‘layer_0’: [ratio_epoch_0, ratio_epoch_1, …], …}
figsize – Figure size tuple
save_path – Path to save figure
- plot_gradient_norms(gradient_norms=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot gradient norms per layer over epochs.
- plot_training_animation(bg='dark', save_path=None)[source]¶
Creates a comprehensive 4-panel GIF animation showing: 1. Loss curves over time 2. Accuracy curves over time 3. Current metrics bar chart 4. Gradient flow analysis Speed automatically adjusts based on epoch count for smooth motion feel. :param bg: Theme (‘dark’ or ‘light’) :param save_path: Path to save GIF (defaults to ‘mlp_training_animation.gif’)
- Returns:
Path to saved GIF file