API Reference¶
Complete reference for all NeuroScope modules, classes, and functions.
Core Module¶
neuroscope¶
NeuroScope: A comprehensive neural network framework for learning and prototyping.
NeuroScope provides a clean, education-oriented interface for building and analyzing multi-layer perceptrons with advanced diagnostic capabilities. Designed for rapid experimentation with comprehensive monitoring and visualization tools.
- Core Components:
MLP: Modern multi-layer perceptron implementation
Diagnostics: Pre-training, training, and post-training analysis tools
Visualization: Publication-quality plotting and analysis
Example
>>> from neuroscope.mlp import MLP, mse, accuracy_binary, relu
>>> from neuroscope.diagnostics import PreTrainingAnalyzer, TrainingMonitor
>>> from neuroscope.viz import Visualizer
>>>
>>> # Create and train model
>>> model = MLP([784, 128, 10], activation="relu", out_activation="softmax")
>>> model.compile(optimizer="adam", lr=1e-3)
>>>
>>> # Analyze before training
>>> analyzer = PreTrainingAnalyzer(model)
>>> analyzer.analyze(X_train, y_train)
>>>
>>> # Ultra-fast training for production
>>> history = model.fit_fast(X_train, y_train, X_val, y_val, epochs=100, batch_size=256)
>>>
>>> # Or train with full diagnostics for research
>>> monitor = TrainingMonitor()
>>> history = model.fit(X_train, y_train, monitor=monitor, epochs=100)
>>>
>>> # Use functions directly
>>> loss = mse(y_true, y_pred)
>>> acc = accuracy_binary(y_true, y_pred)
>>>
>>> # Visualize results
>>> viz = Visualizer(history)
>>> viz.plot_learning_curves()
- class neuroscope.MLP(layer_dims, hidden_activation='leaky_relu', out_activation=None, init_method='smart', init_seed=42, dropout_rate=0.0, dropout_type='normal')[source]¶
Bases:
object
Multi-layer perceptron for quick prototyping and experimentation.
This MLP supports arbitrary layer sizes, multiple activation functions, and modern optimization techniques. Use compile to set hyperparameters and fit to train the model. Includes comprehensive training monitoring and diagnostic capabilities.
- Parameters:
layer_dims (
Sequence[int]
) – Sizes of layers including input & output, e.g. [784, 128, 10].hidden_activation (
str
, optional) – Activation function name for hidden layers. Options: “relu”, “leaky_relu”, “tanh”, “sigmoid”, “selu”. Defaults to “leaky_relu”.out_activation (
str
, optional) – Output activation function. Options: “sigmoid” (binary), “softmax” (multiclass), None (regression). Defaults to None.init_method (
str
, optional) – Weight initialization strategy. Options: “smart”, “he”, “xavier”, “random”, “selu_init”. Defaults to “smart”.init_seed (
int
, optional) – Random seed for reproducible weight initialization. Defaults to 42.dropout_rate (
float
, optional) – Dropout probability for hidden layers (0.0-1.0). Defaults to 0.0.dropout_type (
str
, optional) – Dropout variant (“normal”, “alpha”). Defaults to “normal”.
- weights¶
Internal weight matrices for each layer.
- Type:
list[NDArray[np.float64]]
- biases¶
Internal bias vectors for each layer.
- Type:
list[NDArray[np.float64]]
Example
>>> from neuroscope.mlp import MLP >>> model = MLP([784, 128, 64, 10], activation="relu", out_activation="softmax") >>> model.compile(optimizer="adam", lr=1e-3) >>> history = model.fit(X_train, y_train, epochs=100) >>> predictions = model.predict(X_test)
- __init__(layer_dims, hidden_activation='leaky_relu', out_activation=None, init_method='smart', init_seed=42, dropout_rate=0.0, dropout_type='normal')[source]¶
- compile(optimizer='adam', lr=0.001, reg=None, lamda=0.01, gradient_clip=None)[source]¶
Configure the model for training.
Sets up the optimizer, learning rate, regularization, and other training hyperparameters. Must be called before training the model.
- Parameters:
optimizer (
str
, optional) – Optimization algorithm (“sgd”, “adam”). Defaults to “adam”.lr (
float
, optional) – Learning rate for parameter updates. Defaults to 0.001.reg (
str
, optional) – Regularization type (“l2”, None). Defaults to None.lamda (
float
, optional) – Regularization strength (lambda parameter). Defaults to 0.01.gradient_clip (
float
, optional) – Maximum gradient norm for clipping. Defaults to None.
- Raises:
ValueError – If invalid optimizer is specified.
Example
>>> model.compile(optimizer="adam", lr=1e-3, reg="l2", lamda=0.01)
- evaluate(X, y, metric='smart', binary_thresh=0.5)[source]¶
Evaluate model performance on given data.
Computes loss and evaluation metric on the provided dataset. Automatically selects appropriate loss function based on output activation.
- Parameters:
X (
NDArray[np.float64]
) – Input data of shape (N, input_dim).y (
NDArray[np.float64]
) – Target values of shape (N,) or (N, output_dim).metric (
str
, optional) – Evaluation metric (“smart”, “accuracy”, “mse”, “rmse”, “mae”, “r2”, “f1”, “precision”, “recall”). Defaults to “smart”.binary_thresh (
float
, optional) – Threshold for binary classification. Defaults to 0.5.
- Returns:
(loss, metric_score) where metric_score depends on the metric type.
- Return type:
Example
>>> loss, accuracy = model.evaluate(X_test, y_test, metric="accuracy") >>> print(f"Test Loss: {loss:.4f}, Accuracy: {accuracy:.2%}")
- fit(X_train, y_train, X_val=None, y_val=None, epochs=10, batch_size=32, verbose=True, log_every=1, early_stopping_patience=50, lr_decay=None, numerical_check_freq=100, metric='smart', reset_before_training=True, monitor=None, monitor_freq=1)[source]¶
Train the neural network on provided data.
Implements full training loop with support for validation, early stopping, learning rate decay, and comprehensive monitoring. Returns detailed training history and statistics for analysis.
- Parameters:
X_train (
NDArray[np.float64]
) – Training input data of shape (N, input_dim).y_train (
NDArray[np.float64]
) – Training targets of shape (N,) or (N, output_dim).X_val (
NDArray[np.float64]
, optional) – Validation input data. Defaults to None.y_val (
NDArray[np.float64]
, optional) – Validation targets. Defaults to None.epochs (
int
, optional) – Number of training epochs. Defaults to 10.batch_size (
int
, optional) – Mini-batch size. If None, uses full batch. Defaults to None.verbose (
bool
, optional) – Whether to print training progress. Defaults to True.log_every (
int
, optional) – Frequency of progress logging in epochs. Defaults to 1.early_stopping_patience (
int
, optional) – Epochs to wait for improvement before stopping. Defaults to 50.lr_decay (
float
, optional) – Learning rate decay factor per epoch. Defaults to None.numerical_check_freq (
int
, optional) – Frequency of numerical stability checks. Defaults to 100.metric (
str
, optional) – Evaluation metric for monitoring. Defaults to “smart”.reset_before_training (
bool
, optional) – Whether to reset weights before training. Defaults to True.monitor (
TrainingMonitor
, optional) – Real-time training monitor. Defaults to None.monitor_freq (
int
, optional) – Monitoring frequency in epochs. Defaults to 1.
- Returns:
- Comprehensive training results containing:
weights: Final trained weight matrices
biases: Final trained bias vectors
history: Training/validation loss and metrics per epoch
activations: Sample activations from middle epoch
gradients: Sample gradients from middle epoch
weight_stats_over_epochs: Weight statistics evolution
activation_stats_over_epochs: Activation statistics evolution
gradient_stats_over_epochs: Gradient statistics evolution
- Return type:
- Raises:
ValueError – If model is not compiled or if input dimensions are incompatible.
Example
>>> history = model.fit(X_train, y_train, X_val, y_val, ... epochs=100, batch_size=32, ... early_stopping_patience=10) >>> print(f"Final training loss: {history['history']['train_loss'][-1]:.4f}")
- fit_batch(X_batch, y_batch, epochs=10, verbose=True, metric='smart')[source]¶
Train on a single batch for specified epochs. Uses 2-8 samples of given batch. .. note:
The range (2-8) samples is based on PyTorch implementation and literature such as blog of Karpathy (A Recipe for Training Neural Networks), Universal Approximation Theorem (Hornik et al., 1989), Empirical Risk Minimization (Vapnik, 1998) and others.
- fit_fast(X_train, y_train, X_val=None, y_val=None, epochs=10, batch_size=32, verbose=True, log_every=1, early_stopping_patience=50, lr_decay=None, numerical_check_freq=100, metric='smart', reset_before_training=True, eval_freq=5)[source]¶
High-performance training method optimized for fast training.
Ultra-fast training loop that eliminates statistics collection overhead and monitoring bottlenecks. Provides 10-100x speedup over standard fit() while maintaining identical API and training quality.
Key Performance Optimizations: - Eliminates expensive statistics collection (main bottleneck) - Uses optimized batch processing with array views - Streamlined training loop with only essential operations - Configurable evaluation frequency to reduce overhead
Expected Performance: - 10-100x faster than fit() method - 60-80% less memory usage
- Parameters:
X_train (
NDArray[np.float64]
) – Training input data of shape (N, input_dim).y_train (
NDArray[np.float64]
) – Training targets of shape (N,) or (N, output_dim).X_val (
NDArray[np.float64]
, optional) – Validation input data. Defaults to None.y_val (
NDArray[np.float64]
, optional) – Validation targets. Defaults to None.epochs (
int
, optional) – Number of training epochs. Defaults to 10.batch_size (
int
, optional) – Mini-batch size. If None, uses full batch. Defaults to None.verbose (
bool
, optional) – Whether to print training progress. Defaults to True.log_every (
int
, optional) – Frequency of progress logging in epochs. Defaults to 1.early_stopping_patience (
int
, optional) – Epochs to wait for improvement before stopping. Defaults to 50.lr_decay (
float
, optional) – Learning rate decay factor per epoch. Defaults to None.numerical_check_freq (
int
, optional) – Frequency of numerical stability checks. Defaults to 100.metric (
str
, optional) – Evaluation metric for monitoring. Defaults to “smart”.reset_before_training (
bool
, optional) – Whether to reset weights before training. Defaults to True.monitor (
TrainingMonitor
, optional) – Real-time training monitor. Defaults to None.monitor_freq (
int
, optional) – Monitoring frequency in epochs. Defaults to 1.eval_freq (
int
, optional) – Evaluation frequency in epochs for performance. Defaults to 5.
- Returns:
- Streamlined training results containing:
weights: Final trained weight matrices
biases: Final trained bias vectors
history: Training/validation loss and metrics per epoch
performance_stats: Training time and speed metrics
- Return type:
- Raises:
ValueError – If model is not compiled or if input dimensions are incompatible.
Example
>>> # Ultra-fast training >>> history = model.fit_fast(X_train, y_train, X_val, y_val, ... epochs=100, batch_size=256, eval_freq=5)
Note
For research and debugging with full diagnostics, use the standard fit() method. This method prioritizes speed over detailed monitoring capabilities.
- predict(X)[source]¶
Generate predictions for input samples.
Performs forward propagation through the network without dropout to generate predictions on new data.
- Parameters:
X (
NDArray[np.float64]
) – Input data of shape (N, input_dim).- Returns:
- Model predictions of shape (N, output_dim).
For regression: continuous values. For binary classification: probabilities (0-1). For multiclass: class probabilities.
- Return type:
NDArray[np.float64]
Example
>>> predictions = model.predict(X_test) >>> binary_preds = (predictions > 0.5).astype(int) # For binary classification
- class neuroscope.PreTrainingAnalyzer(model)[source]¶
Bases:
object
Comprehensive pre-training diagnostic tool for neural networks.
Analyzes model architecture, weight initialization, and data compatibility before training begins. Implements research-validated checks to identify potential training issues early, based on established deep learning principles from Glorot & Bengio (2010), He et al. (2015), and others.
- Parameters:
model – Compiled MLP model instance with initialized weights.
- model¶
Reference to the neural network model.
Example
>>> from neuroscope.diagnostics import PreTrainingAnalyzer >>> model = MLP([784, 128, 10]) >>> model.compile(lr=1e-3) >>> analyzer = PreTrainingAnalyzer(model) >>> results = analyzer.analyze(X_train, y_train)
Initialize analyzer with a compiled model.
- analyze(X: ndarray, y: ndarray) None [source]¶
Comprehensive pre-training analysis with clean tabular output.
- analyze_architecture_sanity() Dict[str, Any] [source]¶
Perform comprehensive architecture validation.
Validates network architecture against established deep learning principles and best practices. Checks for common architectural pitfalls such as incompatible activation functions, inappropriate depth, and problematic layer configurations based on research findings.
- Returns:
- Analysis results containing:
issues: List of critical architectural problems
warnings: List of potential concerns
status: Overall architecture quality (“PASS”, “WARN”, “FAIL”)
note: Summary diagnostic message
- Return type:
Note
Based on research from Bengio et al. (2009) on vanishing gradients, modern best practices for deep architectures, and activation function compatibility studies.
Example
>>> results = analyzer.analyze_architecture_sanity() >>> if results['issues']: ... print("Critical issues found:", results['issues'])
- analyze_capacity_data_ratio(X: ndarray, y: ndarray) Dict[str, Any] [source]¶
Analyze parameter count relative to training data size.
- analyze_convergence_feasibility(X: ndarray, y: ndarray) Dict[str, Any] [source]¶
Assess whether the current setup can theoretically converge.
- analyze_initial_loss(X: ndarray, y: ndarray) Dict[str, Any] [source]¶
Validate initial loss against theoretical expectations.
Compares the model’s initial loss (before training) with theoretical baselines for different task types. For classification, expects loss near -log(1/num_classes). For regression, compares against variance-based baseline as described in Goodfellow et al. (2016).
- Parameters:
X (
NDArray[np.float64]
) – Input data of shape (N, input_dim).y (
NDArray[np.float64]
) – Target values of shape (N,) or (N, output_dim).
- Returns:
- Analysis results containing:
initial_loss: Computed initial loss value
expected_loss: Theoretical expected loss
ratio: initial_loss / expected_loss
task_type: Detected task type (regression/classification)
status: “PASS”, “WARN”, or “FAIL”
note: Diagnostic message
- Return type:
Example
>>> results = analyzer.analyze_initial_loss(X_train, y_train) >>> print(f"Initial loss check: {results['status']}")
- analyze_layer_capacity() Dict[str, Any] [source]¶
Analyze information bottlenecks and layer capacity issues.
- analyze_weight_init() Dict[str, Any] [source]¶
Validate weight initialization against theoretical optima.
Analyzes weight initialization quality by comparing actual weight standard deviations against theoretically optimal values for different activation functions. Based on He initialization (He et al. 2015) for ReLU variants and Xavier initialization (Glorot & Bengio 2010) for sigmoid/tanh.
- Returns:
- Analysis results containing:
layers: List of per-layer initialization analysis
status: Overall initialization quality (“PASS”, “WARN”, “FAIL”)
note: Summary diagnostic message
- Return type:
Example
>>> results = analyzer.analyze_weight_init() >>> for layer in results['layers']: ... print(f"Layer {layer['layer']}: {layer['status']}")
- class neuroscope.TrainingMonitor(model=None, history_size=50)[source]¶
Bases:
object
Comprehensive real-time training monitoring system for neural networks.
Monitors 10 key training health indicators: - Dead ReLU neurons detection - Vanishing Gradient Problem (VGP) detection - Exploding Gradient Problem (EGP) detection - Weight health analysis - Learning progress - Overfitting detection - Gradient signal-to-noise ratio - Activation saturation detection (tanh/sigmoid) - Training plateau detection - Weight update vs magnitude ratios
Initialize comprehensive training monitor.
Sets up monitoring infrastructure for tracking 10 key training health indicators during neural network training. Uses research-validated thresholds and emoji-based status visualization.
- Parameters:
model – Optional MLP model instance (can be set later).
history_size (
int
, optional) – Number of epochs to keep in rolling history for trend analysis. Defaults to 50.
Example
>>> monitor = TrainingMonitor(history_size=100) >>> results = model.fit(X, y, monitor=monitor)
- __init__(model=None, history_size=50)[source]¶
Initialize comprehensive training monitor.
Sets up monitoring infrastructure for tracking 10 key training health indicators during neural network training. Uses research-validated thresholds and emoji-based status visualization.
- Parameters:
model – Optional MLP model instance (can be set later).
history_size (
int
, optional) – Number of epochs to keep in rolling history for trend analysis. Defaults to 50.
Example
>>> monitor = TrainingMonitor(history_size=100) >>> results = model.fit(X, y, monitor=monitor)
- monitor_activation_saturation(activations: List[ndarray], activation_functions: List[str] = None) Tuple[float, str] [source]¶
Research-accurate activation saturation detection. Based on Glorot & Bengio (2010), Hochreiter (1991), and He et al. (2015). Key insights: - Saturation = extreme activation values + poor gradient flow + skewed distributions - Uses function-specific thresholds and statistical distribution analysis - Tracks saturation propagation through network layers :param activations: List of activation arrays from each layer :param activation_functions: List of activation function names for each layer
- Returns:
Tuple of (saturation_score, emoji_status)
- monitor_exploding_gradients(gradients: List[ndarray]) Tuple[float, str] [source]¶
Detect exploding gradient problem using gradient norm analysis.
Monitors gradient magnitudes to detect exploding gradients based on research by Pascanu et al. (2013). Uses both global norm and per-layer analysis to identify unstable training dynamics.
- Parameters:
gradients (
list[NDArray[np.float64]]
) – Gradient arrays for each layer.- Returns:
- (egp_severity, status_emoji) where:
egp_severity: Float in [0,1] indicating severity
status: 🟢 (stable), 🟡 (elevated), 🔴 (exploding)
- Return type:
Note
Based on “On the difficulty of training recurrent neural networks” (Pascanu et al. 2013) gradient clipping and norm analysis.
- monitor_gradient_snr(gradients: List[ndarray]) Tuple[float, str] [source]¶
Calculate Gradient Signal-to-Noise Ratio (GSNR) for optimization health. - Signal: RMS gradient magnitude (update strength) - Noise: Coefficient of variation (relative inconsistency) - GSNR = RMS_magnitude / (std_magnitude + ε) This measures gradient update consistency. :param gradients: List of gradient arrays from each layer
- Returns:
Tuple of (gsnr_score, emoji_status)
- monitor_learning_progress(current_loss: float, val_loss: float | None = None) Tuple[float, str] [source]¶
Research-accurate learning progress monitor. Based on optimization literature: Bottou (2010), Goodfellow et al. (2016), Smith (2017). Key insights: - Progress = consistent loss reduction + convergence stability + generalization health - Uses exponential moving averages and plateau detection from literature :param current_loss: Current training loss :param val_loss: Optional validation loss
- Returns:
Tuple of (progress_score, emoji_status)
- monitor_overfitting(train_loss: float, val_loss: float | None = None) Tuple[float, str] [source]¶
Research-accurate overfitting detection. Based on Prechelt (1998), Goodfellow et al. (2016), and Caruana et al. (2001). Key insights: - Overfitting = increasing generalization gap + validation curve deterioration :param train_loss: Training loss :param val_loss: Validation loss
- Returns:
Tuple of (overfitting_score, emoji_status)
- monitor_plateau(current_loss: float, val_loss: float | None = None, current_gradients: List[ndarray] | None = None) Tuple[float, str] [source]¶
Research-accurate training plateau detection. Based on Prechelt (1998), Bengio (2012), and Smith (2017). Key insights: - Plateau = statistical stagnation + loss of learning momentum + gradient analysis - Uses multi-scale analysis and statistical significance testing - Integrates validation correlation and gradient magnitude trends :param current_loss: Current training loss :param val_loss: Optional validation loss for correlation analysis :param current_gradients: Optional gradient arrays for gradient-based detection
- Returns:
Tuple of (plateau_score, emoji_status)
- monitor_relu_dead_neurons(activations: List[ndarray], activation_functions: List[str] | None = None) Tuple[float, str] [source]¶
Monitor for dead ReLU neurons during training.
Detects neurons that have become inactive (always output zero) which indicates the “dying ReLU” problem. Uses activation-function-aware thresholds based on research by Glorot et al. (2011) and He et al. (2015).
Natural sparsity in ReLU networks is expected (~50%), but excessive sparsity (>90%) indicates dead neurons that cannot learn.
- Parameters:
activations (
list[NDArray[np.float64]]
) – Layer activation outputs.activation_functions (
list[str]
, optional) – Activation function names per layer.
- Returns:
- (dead_percentage, status_emoji) where status is:
🟢: Healthy sparsity (<10% dead)
🟡: Moderate concern (10-30% dead)
🔴: Critical issue (>30% dead)
- Return type:
Note
Based on “Deep Sparse Rectifier Neural Networks” (Glorot et al. 2011) and “Delving Deep into Rectifiers” (He et al. 2015).
- monitor_step(epoch: int, train_loss: float, val_loss: float | None = None, activations: List[ndarray] | None = None, gradients: List[ndarray] | None = None, weights: List[ndarray] | None = None, weight_updates: List[ndarray] | None = None, activation_functions: List[str] | None = None) Dict[str, Any] [source]¶
Perform one monitoring step and return all metrics. :param epoch: Current epoch number :param train_loss: Training loss :param val_loss: Validation loss (optional) :param activations: Layer activations (optional) :param gradients: Layer gradients (optional) :param weights: Layer weights (optional) :param weight_updates: Weight updates (optional) :param activation_functions: List of activation function names (optional)
- Returns:
Dictionary containing all monitoring results
- monitor_vanishing_gradients(gradients: List[ndarray]) Tuple[float, str] [source]¶
Detect vanishing gradient problem using research-validated metrics.
Monitors gradient flow through the network to detect vanishing gradients based on variance analysis from Glorot & Bengio (2010). Healthy networks maintain similar gradient variance across layers.
- Parameters:
gradients (
list[NDArray[np.float64]]
) – Gradient arrays for each layer.- Returns:
- (vgp_severity, status_emoji) where:
vgp_severity: Float in [0,1] indicating severity
status: 🟢 (healthy), 🟡 (warning), 🔴 (critical)
- Return type:
Note
Implementation based on “Understanding the difficulty of training deep feedforward neural networks” (Glorot & Bengio 2010).
- monitor_weight_health(weights: List[ndarray]) Tuple[float, str] [source]¶
Simple, research-backed weight health monitor. Based on Glorot & Bengio (2010) and He et al. (2015) initialization theory. :param weights: List of weight matrices
- Returns:
Tuple of (health_score, status)
- monitor_weight_update_ratio(weights: List[ndarray], weight_updates: List[ndarray]) Tuple[float, str] [source]¶
Monitor Weight Update to Weight magnitude Ratios (WUR) for learning rate validation. Research-based implementation using: - Smith (2015): Learning rate should produce WUR ~10^-3 to 10^-2 for stable training - Zeiler (2012): Update magnitude should be proportional to weight magnitude Formula: WUR = ||weight_update|| / ||weight|| per layer :param weights: Current weight matrices :param weight_updates: Weight update matrices (gradients * learning_rate)
- Returns:
Tuple of (median_wur, status)
- class neuroscope.PostTrainingEvaluator(model)[source]¶
Bases:
object
Comprehensive post-training evaluation system for neural networks.
Provides thorough analysis of trained model performance including robustness testing, performance metrics evaluation, and diagnostic assessments. Designed to validate model quality and identify potential deployment issues after training completion.
- Parameters:
model – Trained and compiled MLP model instance with initialized weights.
- model¶
Reference to the trained neural network model.
Example
>>> from neuroscope.diagnostics import PostTrainingEvaluator >>> model = MLP([784, 128, 10]) >>> model.compile(lr=1e-3) >>> history = model.fit(X_train, y_train, epochs=100) >>> evaluator = PostTrainingEvaluator(model) >>> evaluator.evaluate(X_test, y_test) >>> # Access detailed results >>> robustness = evaluator.evaluate_robustness(X_test, y_test) >>> performance = evaluator.evaluate_performance(X_test, y_test)
Initialize evaluator with a trained model.
- evaluate(X_test: ndarray, y_test: ndarray, X_train: ndarray | None = None, y_train: ndarray | None = None)[source]¶
Run comprehensive model evaluation and generate summary report.
- evaluate_performance(X: ndarray, y: ndarray) Dict[str, Any] [source]¶
Evaluate model performance metrics.
- class neuroscope.Visualizer(hist)[source]¶
Bases:
object
High quality visualization tool for neural network training analysis.
Provides comprehensive plotting capabilities for analyzing training dynamics, network behavior, and diagnostic information. Creates professional-grade figures suitable for research publications and presentations.
- Parameters:
hist (
dict
) – Complete training history from model.fit() containing: - history: Training/validation metrics per epoch - weights/biases: Final network parameters - activations/gradients: Sample network internals - *_stats_over_epochs: Statistical evolution during training
- weights/biases
Final network parameters.
- activations/gradients
Representative network internals.
Example
>>> from neuroscope.viz import Visualizer >>> history = model.fit(X_train, y_train, epochs=100) >>> viz = Visualizer(history) >>> viz.plot_learning_curves() >>> viz.plot_activation_distribution() >>> viz.plot_gradient_flow()
Initialize visualizer with comprehensive training history.
Sets up visualization infrastructure and applies publication-quality styling to all plots. Automatically extracts relevant data components for different types of analysis.
- Parameters:
hist (
dict
) – Training history from model.fit() containing all training statistics, network states, and diagnostic information.
- __init__(hist)[source]¶
Initialize visualizer with comprehensive training history.
Sets up visualization infrastructure and applies publication-quality styling to all plots. Automatically extracts relevant data components for different types of analysis.
- Parameters:
hist (
dict
) – Training history from model.fit() containing all training statistics, network states, and diagnostic information.
- plot_activation_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Plot activation value distributions across network layers.
Visualizes the distribution of activation values for each layer at a specific epoch, aggregated from all mini-batches. Useful for detecting activation saturation, dead neurons, and distribution shifts during training.
- Parameters:
epoch (
int
, optional) – Specific epoch to plot. If None, uses last epoch. Defaults to None.figsize (
tuple[int
,int]
, optional) – Figure dimensions. Defaults to (9, 4).kde (
bool
, optional) – Whether to use KDE-style smoothing for smoother curves. Defaults to False.last_layer (
bool
, optional) – Whether to include output layer. Defaults to False.save_path (
str
, optional) – Path to save the figure. Defaults to None.
Example
>>> viz.plot_activation_hist(epoch=50, kde=True, save_path='activations.png')
- plot_activation_stats(activation_stats=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot activation statistics over time with both mean and std.
- Parameters:
activation_stats – Dict of layer activation stats OR None to use class data Format: {‘layer_0’: {‘mean’: […], ‘std’: […]}, …}
figsize – Figure size tuple
save_path – Path to save figure
- plot_curves_fast(figsize=(10, 4), markers=True, save_path=None)[source]¶
Plot learning curves for fit_fast() results.
- plot_gradient_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Plot gradient value distributions across network layers.
Visualizes gradient distributions to detect vanishing/exploding gradient problems, gradient flow issues, and training stability. Shows zero-line reference for assessing gradient symmetry and magnitude.
- Parameters:
epoch (
int
, optional) – Specific epoch to plot. If None, uses last epoch. Defaults to None.figsize (
tuple[int
,int]
, optional) – Figure dimensions. Defaults to (9, 4).kde (
bool
, optional) – Whether to use KDE-style smoothing. Defaults to False.last_layer (
bool
, optional) – Whether to include output layer gradients. Defaults to False.save_path (
str
, optional) – Path to save the figure. Defaults to None.
Note
Gradient distributions should be roughly symmetric around zero for healthy training. Very narrow distributions may indicate vanishing gradients, while very wide distributions may indicate exploding gradients.
Example
>>> viz.plot_gradient_hist(epoch=25, kde=True, save_path='gradients.png')
- plot_gradient_norms(gradient_norms=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot gradient norms per layer over epochs.
- plot_gradient_stats(figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot gradient statistics over time with both mean and std.
- plot_learning_curves(figsize=(9, 4), ci=False, markers=True, save_path=None, metric='accuracy')[source]¶
Plot training and validation learning curves for regular fit() results.
Creates highest quality subplot showing loss and metric evolution during training. Automatically detects available data and applies consistent styling with optional confidence intervals.
Note: For fit_fast() results, use plot_curves_fast() instead.
- Parameters:
figsize (
tuple[int
,int]
, optional) – Figure dimensions (width, height). Defaults to (9, 4).ci (
bool
, optional) – Whether to add confidence intervals using moving window statistics. Only available for regular fit() results. Defaults to False.markers (
bool
, optional) – Whether to show markers on line plots. Defaults to True.save_path (
str
, optional) – Path to save the figure. Defaults to None.metric (
str
, optional) – Name of the metric for y-axis label. Defaults to ‘accuracy’.
Example
>>> viz.plot_learning_curves(figsize=(10, 5), ci=True, save_path='curves.png')
- plot_training_animation(bg='dark', save_path=None)[source]¶
Creates a comprehensive 4-panel GIF animation showing: 1. Loss curves over time 2. Accuracy curves over time 3. Current metrics bar chart 4. Gradient flow analysis Speed automatically adjusts based on epoch count for smooth motion feel. :param bg: Theme (‘dark’ or ‘light’) :param save_path: Path to save GIF (defaults to ‘mlp_training_animation.gif’)
- Returns:
Path to saved GIF file
- plot_update_ratios(update_ratios=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot weight update ratios per layer across epochs.
- Parameters:
update_ratios – Dict of layer update ratios (optional - uses collected data if None) Format: {‘layer_0’: [ratio_epoch_0, ratio_epoch_1, …], …}
figsize – Figure size tuple
save_path – Path to save figure
- plot_weight_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Uses aggregated samples from all mini-batches within an epoch to create representative distributions. Shows weight evolution patterns.
- Parameters:
epoch – Specific epoch to plot (default: last epoch)
figsize – Figure size tuple
kde – Whether to use KDE-style smoothing
last_layer – Whether to include output layer (default: False, hidden layers only)
save_path – Path to save figure
- neuroscope.PTA¶
alias of
PreTrainingAnalyzer
- neuroscope.TM¶
alias of
TrainingMonitor
- neuroscope.PTE¶
alias of
PostTrainingEvaluator
- neuroscope.VIZ¶
alias of
Visualizer
- neuroscope.mse(y_true, y_pred)¶
Compute mean squared error loss.
Standard regression loss function that penalizes squared differences between predictions and targets. Suitable for continuous target values.
- Parameters:
y_true (
NDArray[np.float64]
) – Ground truth values of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]
) – Predicted values of shape (N,) or (N, 1).
- Returns:
Mean squared error loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.mse(y_true, y_pred) >>> print(f"MSE Loss: {loss:.4f}")
- neuroscope.bce(y_true, y_pred, eps=1e-12)¶
Compute binary cross-entropy loss.
Standard loss function for binary classification problems. Applies numerical clipping to prevent log(0) errors and ensure stability.
- Parameters:
y_true (
NDArray[np.float64]
) – Binary labels (0/1) of shape (N,).y_pred (
NDArray[np.float64]
) – Predicted probabilities of shape (N,).eps (
float
, optional) – Small value for numerical stability. Defaults to 1e-12.
- Returns:
Binary cross-entropy loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.bce(y_true, y_pred) >>> print(f"BCE Loss: {loss:.4f}")
- neuroscope.cce(y_true, y_pred, eps=1e-12)¶
Compute categorical cross-entropy loss.
Standard loss function for multi-class classification. Handles both sparse labels (class indices) and one-hot encoded targets.
- Parameters:
y_true (
NDArray[np.float64]
) – Class labels of shape (N,) for sparse labels or (N, C) for one-hot encoded targets.y_pred (
NDArray[np.float64]
) – Predicted class probabilities of shape (N, C).eps (
float
, optional) – Small value for numerical stability. Defaults to 1e-12.
- Returns:
Categorical cross-entropy loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.cce(y_true, y_pred) >>> print(f"CCE Loss: {loss:.4f}")
- neuroscope.mse_with_reg(y_true, y_pred, weights, lamda=0.01)¶
- neuroscope.bce_with_reg(y_true, y_pred, weights, lamda=0.01, eps=1e-12)¶
- neuroscope.cce_with_reg(y_true, y_pred, weights, lamda=0.01, eps=1e-12)¶
- neuroscope.accuracy_binary(y_true, y_pred, thresh=0.5)¶
Compute binary classification accuracy.
Calculates the fraction of correctly predicted samples for binary classification by applying a threshold to predicted probabilities.
- Parameters:
y_true (
NDArray[np.float64]
) – Binary labels (0/1) of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]
) – Predicted probabilities of shape (N,) or (N, 1).thresh (
float
, optional) – Classification threshold. Defaults to 0.5.
- Returns:
Binary classification accuracy as a fraction (0.0 to 1.0).
- Return type:
Example
>>> accuracy = Metrics.accuracy_binary(y_true, y_pred, thresh=0.5) >>> print(f"Binary Accuracy: {accuracy:.2%}")
- neuroscope.accuracy_multiclass(y_true, y_pred)¶
Compute multi-class classification accuracy.
Calculates the fraction of correctly predicted samples for multi-class classification problems. Handles both sparse labels and one-hot encoded inputs.
- Parameters:
y_true (
NDArray[np.float64]
) – True class labels of shape (N,) for sparse labels or (N, C) for one-hot encoded.y_pred (
NDArray[np.float64]
) – Predicted class probabilities of shape (N, C).
- Returns:
Classification accuracy as a fraction (0.0 to 1.0).
- Return type:
Example
>>> accuracy = Metrics.accuracy_multiclass(y_true, y_pred) >>> print(f"Accuracy: {accuracy:.2%}")
- neuroscope.rmse(y_true, y_pred)¶
- neuroscope.mae(y_true, y_pred)¶
- neuroscope.r2_score(y_true, y_pred)¶
Compute coefficient of determination (R² score).
Measures the proportion of variance in the dependent variable that is predictable from the independent variables. R² = 1 indicates perfect fit, R² = 0 indicates the model performs as well as predicting the mean.
- Parameters:
y_true (
NDArray[np.float64]
) – Ground truth values of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]
) – Predicted values of shape (N,) or (N, 1).
- Returns:
R² score (can be negative for very poor fits).
- Return type:
Example
>>> r2 = Metrics.r2_score(y_true, y_pred) >>> print(f"R² Score: {r2:.3f}")
- neuroscope.f1_score(y_true, y_pred, average='weighted', threshold=0.5)¶
Compute F1 score: 2 * (Precision * Recall) / (Precision + Recall)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
- neuroscope.precision(y_true, y_pred, average='weighted', threshold=0.5)¶
Compute precision score: TP / (TP + FP)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
- neuroscope.recall(y_true, y_pred, average='weighted', threshold=0.5)¶
Compute recall score: TP / (TP + FN)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
- neuroscope.relu(x)¶
Compute ReLU (Rectified Linear Unit) activation.
Applies the rectified linear activation function that outputs the input for positive values and zero for negative values. Most popular activation for hidden layers in modern neural networks.
- Parameters:
x (
NDArray[np.float64]
) – Input array of any shape.- Returns:
ReLU-activated values (non-negative).
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.relu(z) >>> # Negative values become 0, positive values unchanged
- neuroscope.leaky_relu(x, negative_slope=0.01)¶
Compute Leaky ReLU activation function.
Variant of ReLU that allows small negative values to flow through, helping to mitigate the “dying ReLU” problem where neurons can become permanently inactive.
- Parameters:
x (
NDArray[np.float64]
) – Input array of any shape.negative_slope (
float
, optional) – Slope for negative values. Defaults to 0.01.
- Returns:
Leaky ReLU-activated values.
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.leaky_relu(z, negative_slope=0.01) >>> # Positive values unchanged, negative values scaled by 0.01
- neuroscope.sigmoid(x)¶
Compute sigmoid activation function.
Applies the logistic sigmoid function that maps input to (0, 1) range. Includes numerical clipping to prevent overflow in exponential computation.
- Parameters:
x (
NDArray[np.float64]
) – Input array of any shape.- Returns:
Sigmoid-activated values in range (0, 1).
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.sigmoid(z) >>> # Values are now between 0 and 1
- neuroscope.tanh(x)¶
- neuroscope.selu(x)¶
- neuroscope.softmax(z)¶
- neuroscope.he_init(layer_dims: list, seed=42)¶
He initialization for ReLU and ReLU-variant activations.
Optimal for ReLU-based networks as derived in He et al. (2015). Uses standard deviation of sqrt(2/fan_in) to maintain proper variance propagation through ReLU activations.
- Parameters:
layer_dims (
list[int]
) – Layer dimensions [input_dim, hidden_dim, …, output_dim].seed (
int
, optional) – Random seed for reproducibility. Defaults to 42.
- Returns:
- (weights, biases) where weights are initialized
according to He initialization and biases are zero-initialized.
- Return type:
Note
Based on “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification” (He et al. 2015).
Example
>>> weights, biases = WeightInits.he_init([784, 128, 10])
- neuroscope.xavier_init(layer_dims: list, seed=42)¶
Xavier/Glorot initialization for sigmoid and tanh activations.
Optimal for symmetric activations like tanh and sigmoid. Uses standard deviation of sqrt(2/(fan_in + fan_out)) to maintain constant variance across layers.
- Parameters:
layer_dims (
list[int]
) – Layer dimensions [input_dim, hidden_dim, …, output_dim].seed (
int
, optional) – Random seed for reproducibility. Defaults to 42.
- Returns:
(weights, biases) with Xavier-initialized weights and zero biases.
- Return type:
Note
Based on “Understanding the difficulty of training deep feedforward neural networks” (Glorot & Bengio 2010).
Example
>>> weights, biases = WeightInits.xavier_init([784, 128, 10])
MLP Module¶
Multi-Layer Perceptron¶
MLP Neural Network Main neural network class integrating all framework components.
- class neuroscope.mlp.mlp.MLP(layer_dims, hidden_activation='leaky_relu', out_activation=None, init_method='smart', init_seed=42, dropout_rate=0.0, dropout_type='normal')[source]¶
Bases:
object
Multi-layer perceptron for quick prototyping and experimentation.
This MLP supports arbitrary layer sizes, multiple activation functions, and modern optimization techniques. Use compile to set hyperparameters and fit to train the model. Includes comprehensive training monitoring and diagnostic capabilities.
- Parameters:
layer_dims (
Sequence[int]
) – Sizes of layers including input & output, e.g. [784, 128, 10].hidden_activation (
str
, optional) – Activation function name for hidden layers. Options: “relu”, “leaky_relu”, “tanh”, “sigmoid”, “selu”. Defaults to “leaky_relu”.out_activation (
str
, optional) – Output activation function. Options: “sigmoid” (binary), “softmax” (multiclass), None (regression). Defaults to None.init_method (
str
, optional) – Weight initialization strategy. Options: “smart”, “he”, “xavier”, “random”, “selu_init”. Defaults to “smart”.init_seed (
int
, optional) – Random seed for reproducible weight initialization. Defaults to 42.dropout_rate (
float
, optional) – Dropout probability for hidden layers (0.0-1.0). Defaults to 0.0.dropout_type (
str
, optional) – Dropout variant (“normal”, “alpha”). Defaults to “normal”.
- weights¶
Internal weight matrices for each layer.
- Type:
list[NDArray[np.float64]]
- biases¶
Internal bias vectors for each layer.
- Type:
list[NDArray[np.float64]]
Example
>>> from neuroscope.mlp import MLP >>> model = MLP([784, 128, 64, 10], activation="relu", out_activation="softmax") >>> model.compile(optimizer="adam", lr=1e-3) >>> history = model.fit(X_train, y_train, epochs=100) >>> predictions = model.predict(X_test)
- __init__(layer_dims, hidden_activation='leaky_relu', out_activation=None, init_method='smart', init_seed=42, dropout_rate=0.0, dropout_type='normal')[source]¶
- compile(optimizer='adam', lr=0.001, reg=None, lamda=0.01, gradient_clip=None)[source]¶
Configure the model for training.
Sets up the optimizer, learning rate, regularization, and other training hyperparameters. Must be called before training the model.
- Parameters:
optimizer (
str
, optional) – Optimization algorithm (“sgd”, “adam”). Defaults to “adam”.lr (
float
, optional) – Learning rate for parameter updates. Defaults to 0.001.reg (
str
, optional) – Regularization type (“l2”, None). Defaults to None.lamda (
float
, optional) – Regularization strength (lambda parameter). Defaults to 0.01.gradient_clip (
float
, optional) – Maximum gradient norm for clipping. Defaults to None.
- Raises:
ValueError – If invalid optimizer is specified.
Example
>>> model.compile(optimizer="adam", lr=1e-3, reg="l2", lamda=0.01)
- predict(X)[source]¶
Generate predictions for input samples.
Performs forward propagation through the network without dropout to generate predictions on new data.
- Parameters:
X (
NDArray[np.float64]
) – Input data of shape (N, input_dim).- Returns:
- Model predictions of shape (N, output_dim).
For regression: continuous values. For binary classification: probabilities (0-1). For multiclass: class probabilities.
- Return type:
NDArray[np.float64]
Example
>>> predictions = model.predict(X_test) >>> binary_preds = (predictions > 0.5).astype(int) # For binary classification
- evaluate(X, y, metric='smart', binary_thresh=0.5)[source]¶
Evaluate model performance on given data.
Computes loss and evaluation metric on the provided dataset. Automatically selects appropriate loss function based on output activation.
- Parameters:
X (
NDArray[np.float64]
) – Input data of shape (N, input_dim).y (
NDArray[np.float64]
) – Target values of shape (N,) or (N, output_dim).metric (
str
, optional) – Evaluation metric (“smart”, “accuracy”, “mse”, “rmse”, “mae”, “r2”, “f1”, “precision”, “recall”). Defaults to “smart”.binary_thresh (
float
, optional) – Threshold for binary classification. Defaults to 0.5.
- Returns:
(loss, metric_score) where metric_score depends on the metric type.
- Return type:
Example
>>> loss, accuracy = model.evaluate(X_test, y_test, metric="accuracy") >>> print(f"Test Loss: {loss:.4f}, Accuracy: {accuracy:.2%}")
- fit(X_train, y_train, X_val=None, y_val=None, epochs=10, batch_size=32, verbose=True, log_every=1, early_stopping_patience=50, lr_decay=None, numerical_check_freq=100, metric='smart', reset_before_training=True, monitor=None, monitor_freq=1)[source]¶
Train the neural network on provided data.
Implements full training loop with support for validation, early stopping, learning rate decay, and comprehensive monitoring. Returns detailed training history and statistics for analysis.
- Parameters:
X_train (
NDArray[np.float64]
) – Training input data of shape (N, input_dim).y_train (
NDArray[np.float64]
) – Training targets of shape (N,) or (N, output_dim).X_val (
NDArray[np.float64]
, optional) – Validation input data. Defaults to None.y_val (
NDArray[np.float64]
, optional) – Validation targets. Defaults to None.epochs (
int
, optional) – Number of training epochs. Defaults to 10.batch_size (
int
, optional) – Mini-batch size. If None, uses full batch. Defaults to None.verbose (
bool
, optional) – Whether to print training progress. Defaults to True.log_every (
int
, optional) – Frequency of progress logging in epochs. Defaults to 1.early_stopping_patience (
int
, optional) – Epochs to wait for improvement before stopping. Defaults to 50.lr_decay (
float
, optional) – Learning rate decay factor per epoch. Defaults to None.numerical_check_freq (
int
, optional) – Frequency of numerical stability checks. Defaults to 100.metric (
str
, optional) – Evaluation metric for monitoring. Defaults to “smart”.reset_before_training (
bool
, optional) – Whether to reset weights before training. Defaults to True.monitor (
TrainingMonitor
, optional) – Real-time training monitor. Defaults to None.monitor_freq (
int
, optional) – Monitoring frequency in epochs. Defaults to 1.
- Returns:
- Comprehensive training results containing:
weights: Final trained weight matrices
biases: Final trained bias vectors
history: Training/validation loss and metrics per epoch
activations: Sample activations from middle epoch
gradients: Sample gradients from middle epoch
weight_stats_over_epochs: Weight statistics evolution
activation_stats_over_epochs: Activation statistics evolution
gradient_stats_over_epochs: Gradient statistics evolution
- Return type:
- Raises:
ValueError – If model is not compiled or if input dimensions are incompatible.
Example
>>> history = model.fit(X_train, y_train, X_val, y_val, ... epochs=100, batch_size=32, ... early_stopping_patience=10) >>> print(f"Final training loss: {history['history']['train_loss'][-1]:.4f}")
- fit_fast(X_train, y_train, X_val=None, y_val=None, epochs=10, batch_size=32, verbose=True, log_every=1, early_stopping_patience=50, lr_decay=None, numerical_check_freq=100, metric='smart', reset_before_training=True, eval_freq=5)[source]¶
High-performance training method optimized for fast training.
Ultra-fast training loop that eliminates statistics collection overhead and monitoring bottlenecks. Provides 10-100x speedup over standard fit() while maintaining identical API and training quality.
Key Performance Optimizations: - Eliminates expensive statistics collection (main bottleneck) - Uses optimized batch processing with array views - Streamlined training loop with only essential operations - Configurable evaluation frequency to reduce overhead
Expected Performance: - 10-100x faster than fit() method - 60-80% less memory usage
- Parameters:
X_train (
NDArray[np.float64]
) – Training input data of shape (N, input_dim).y_train (
NDArray[np.float64]
) – Training targets of shape (N,) or (N, output_dim).X_val (
NDArray[np.float64]
, optional) – Validation input data. Defaults to None.y_val (
NDArray[np.float64]
, optional) – Validation targets. Defaults to None.epochs (
int
, optional) – Number of training epochs. Defaults to 10.batch_size (
int
, optional) – Mini-batch size. If None, uses full batch. Defaults to None.verbose (
bool
, optional) – Whether to print training progress. Defaults to True.log_every (
int
, optional) – Frequency of progress logging in epochs. Defaults to 1.early_stopping_patience (
int
, optional) – Epochs to wait for improvement before stopping. Defaults to 50.lr_decay (
float
, optional) – Learning rate decay factor per epoch. Defaults to None.numerical_check_freq (
int
, optional) – Frequency of numerical stability checks. Defaults to 100.metric (
str
, optional) – Evaluation metric for monitoring. Defaults to “smart”.reset_before_training (
bool
, optional) – Whether to reset weights before training. Defaults to True.monitor (
TrainingMonitor
, optional) – Real-time training monitor. Defaults to None.monitor_freq (
int
, optional) – Monitoring frequency in epochs. Defaults to 1.eval_freq (
int
, optional) – Evaluation frequency in epochs for performance. Defaults to 5.
- Returns:
- Streamlined training results containing:
weights: Final trained weight matrices
biases: Final trained bias vectors
history: Training/validation loss and metrics per epoch
performance_stats: Training time and speed metrics
- Return type:
- Raises:
ValueError – If model is not compiled or if input dimensions are incompatible.
Example
>>> # Ultra-fast training >>> history = model.fit_fast(X_train, y_train, X_val, y_val, ... epochs=100, batch_size=256, eval_freq=5)
Note
For research and debugging with full diagnostics, use the standard fit() method. This method prioritizes speed over detailed monitoring capabilities.
- fit_batch(X_batch, y_batch, epochs=10, verbose=True, metric='smart')[source]¶
Train on a single batch for specified epochs. Uses 2-8 samples of given batch. .. note:
The range (2-8) samples is based on PyTorch implementation and literature such as blog of Karpathy (A Recipe for Training Neural Networks), Universal Approximation Theorem (Hornik et al., 1989), Empirical Risk Minimization (Vapnik, 1998) and others.
Activation Functions¶
Activation Functions Module A comprehensive collection of activation functions and their derivatives for neural networks.
- class neuroscope.mlp.activations.ActivationFunctions[source]¶
Bases:
object
Comprehensive collection of activation functions and their derivatives.
Provides implementations of popular activation functions used in neural networks, including their derivatives for backpropagation. All functions are numerically stable and handle edge cases appropriately.
- static sigmoid(x)[source]¶
Compute sigmoid activation function.
Applies the logistic sigmoid function that maps input to (0, 1) range. Includes numerical clipping to prevent overflow in exponential computation.
- Parameters:
x (
NDArray[np.float64]
) – Input array of any shape.- Returns:
Sigmoid-activated values in range (0, 1).
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.sigmoid(z) >>> # Values are now between 0 and 1
- static relu(x)[source]¶
Compute ReLU (Rectified Linear Unit) activation.
Applies the rectified linear activation function that outputs the input for positive values and zero for negative values. Most popular activation for hidden layers in modern neural networks.
- Parameters:
x (
NDArray[np.float64]
) – Input array of any shape.- Returns:
ReLU-activated values (non-negative).
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.relu(z) >>> # Negative values become 0, positive values unchanged
- static leaky_relu(x, negative_slope=0.01)[source]¶
Compute Leaky ReLU activation function.
Variant of ReLU that allows small negative values to flow through, helping to mitigate the “dying ReLU” problem where neurons can become permanently inactive.
- Parameters:
x (
NDArray[np.float64]
) – Input array of any shape.negative_slope (
float
, optional) – Slope for negative values. Defaults to 0.01.
- Returns:
Leaky ReLU-activated values.
- Return type:
NDArray[np.float64]
Example
>>> activated = ActivationFunctions.leaky_relu(z, negative_slope=0.01) >>> # Positive values unchanged, negative values scaled by 0.01
Loss Functions¶
Loss Functions Module Collection of loss functions for different machine learning tasks.
- class neuroscope.mlp.losses.LossFunctions[source]¶
Bases:
object
Collection of loss functions for neural network training.
Provides implementations of common loss functions used in regression and classification tasks, with support for L2 regularization. All functions handle numerical stability and edge cases appropriately.
- static mse(y_true, y_pred)[source]¶
Compute mean squared error loss.
Standard regression loss function that penalizes squared differences between predictions and targets. Suitable for continuous target values.
- Parameters:
y_true (
NDArray[np.float64]
) – Ground truth values of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]
) – Predicted values of shape (N,) or (N, 1).
- Returns:
Mean squared error loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.mse(y_true, y_pred) >>> print(f"MSE Loss: {loss:.4f}")
- static bce(y_true, y_pred, eps=1e-12)[source]¶
Compute binary cross-entropy loss.
Standard loss function for binary classification problems. Applies numerical clipping to prevent log(0) errors and ensure stability.
- Parameters:
y_true (
NDArray[np.float64]
) – Binary labels (0/1) of shape (N,).y_pred (
NDArray[np.float64]
) – Predicted probabilities of shape (N,).eps (
float
, optional) – Small value for numerical stability. Defaults to 1e-12.
- Returns:
Binary cross-entropy loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.bce(y_true, y_pred) >>> print(f"BCE Loss: {loss:.4f}")
- static cce(y_true, y_pred, eps=1e-12)[source]¶
Compute categorical cross-entropy loss.
Standard loss function for multi-class classification. Handles both sparse labels (class indices) and one-hot encoded targets.
- Parameters:
y_true (
NDArray[np.float64]
) – Class labels of shape (N,) for sparse labels or (N, C) for one-hot encoded targets.y_pred (
NDArray[np.float64]
) – Predicted class probabilities of shape (N, C).eps (
float
, optional) – Small value for numerical stability. Defaults to 1e-12.
- Returns:
Categorical cross-entropy loss (scalar).
- Return type:
Example
>>> loss = LossFunctions.cce(y_true, y_pred) >>> print(f"CCE Loss: {loss:.4f}")
Metrics¶
Metrics Module Comprehensive evaluation metrics for regression and classification tasks.
- class neuroscope.mlp.metrics.Metrics[source]¶
Bases:
object
Comprehensive collection of evaluation metrics for neural networks.
Provides implementations of standard metrics for both regression and classification tasks. All metrics handle edge cases and provide meaningful results for model evaluation.
- static accuracy_multiclass(y_true, y_pred)[source]¶
Compute multi-class classification accuracy.
Calculates the fraction of correctly predicted samples for multi-class classification problems. Handles both sparse labels and one-hot encoded inputs.
- Parameters:
y_true (
NDArray[np.float64]
) – True class labels of shape (N,) for sparse labels or (N, C) for one-hot encoded.y_pred (
NDArray[np.float64]
) – Predicted class probabilities of shape (N, C).
- Returns:
Classification accuracy as a fraction (0.0 to 1.0).
- Return type:
Example
>>> accuracy = Metrics.accuracy_multiclass(y_true, y_pred) >>> print(f"Accuracy: {accuracy:.2%}")
- static accuracy_binary(y_true, y_pred, thresh=0.5)[source]¶
Compute binary classification accuracy.
Calculates the fraction of correctly predicted samples for binary classification by applying a threshold to predicted probabilities.
- Parameters:
y_true (
NDArray[np.float64]
) – Binary labels (0/1) of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]
) – Predicted probabilities of shape (N,) or (N, 1).thresh (
float
, optional) – Classification threshold. Defaults to 0.5.
- Returns:
Binary classification accuracy as a fraction (0.0 to 1.0).
- Return type:
Example
>>> accuracy = Metrics.accuracy_binary(y_true, y_pred, thresh=0.5) >>> print(f"Binary Accuracy: {accuracy:.2%}")
- static mse(y_true, y_pred)[source]¶
Compute mean squared error metric.
Calculates the average squared differences between predicted and true values. Commonly used metric for regression problems.
- Parameters:
y_true (
NDArray[np.float64]
) – Ground truth values of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]
) – Predicted values of shape (N,) or (N, 1).
- Returns:
Mean squared error (scalar).
- Return type:
Example
>>> mse_score = Metrics.mse(y_true, y_pred) >>> print(f"MSE: {mse_score:.4f}")
- static r2_score(y_true, y_pred)[source]¶
Compute coefficient of determination (R² score).
Measures the proportion of variance in the dependent variable that is predictable from the independent variables. R² = 1 indicates perfect fit, R² = 0 indicates the model performs as well as predicting the mean.
- Parameters:
y_true (
NDArray[np.float64]
) – Ground truth values of shape (N,) or (N, 1).y_pred (
NDArray[np.float64]
) – Predicted values of shape (N,) or (N, 1).
- Returns:
R² score (can be negative for very poor fits).
- Return type:
Example
>>> r2 = Metrics.r2_score(y_true, y_pred) >>> print(f"R² Score: {r2:.3f}")
- static precision(y_true, y_pred, average='weighted', threshold=0.5)[source]¶
Compute precision score: TP / (TP + FP)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
- static recall(y_true, y_pred, average='weighted', threshold=0.5)[source]¶
Compute recall score: TP / (TP + FN)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
- static f1_score(y_true, y_pred, average='weighted', threshold=0.5)[source]¶
Compute F1 score: 2 * (Precision * Recall) / (Precision + Recall)
- Parameters:
y_true – True labels
y_pred – Predicted probabilities or labels
average – ‘macro’, ‘weighted’, or None for per-class scores
threshold – Decision threshold for binary classification
Weight Initializers¶
Weight Initialization Module Professional weight initialization strategies for neural networks.
- class neuroscope.mlp.initializers.WeightInits[source]¶
Bases:
object
Research-validated weight initialization strategies for neural networks.
Provides implementations of modern weight initialization methods that help maintain proper gradient flow and accelerate training convergence. All methods follow established theoretical foundations from deep learning research.
- static he_init(layer_dims: list, seed=42)[source]¶
He initialization for ReLU and ReLU-variant activations.
Optimal for ReLU-based networks as derived in He et al. (2015). Uses standard deviation of sqrt(2/fan_in) to maintain proper variance propagation through ReLU activations.
- Parameters:
layer_dims (
list[int]
) – Layer dimensions [input_dim, hidden_dim, …, output_dim].seed (
int
, optional) – Random seed for reproducibility. Defaults to 42.
- Returns:
- (weights, biases) where weights are initialized
according to He initialization and biases are zero-initialized.
- Return type:
Note
Based on “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification” (He et al. 2015).
Example
>>> weights, biases = WeightInits.he_init([784, 128, 10])
- static xavier_init(layer_dims: list, seed=42)[source]¶
Xavier/Glorot initialization for sigmoid and tanh activations.
Optimal for symmetric activations like tanh and sigmoid. Uses standard deviation of sqrt(2/(fan_in + fan_out)) to maintain constant variance across layers.
- Parameters:
layer_dims (
list[int]
) – Layer dimensions [input_dim, hidden_dim, …, output_dim].seed (
int
, optional) – Random seed for reproducibility. Defaults to 42.
- Returns:
(weights, biases) with Xavier-initialized weights and zero biases.
- Return type:
Note
Based on “Understanding the difficulty of training deep feedforward neural networks” (Glorot & Bengio 2010).
Example
>>> weights, biases = WeightInits.xavier_init([784, 128, 10])
- static smart_init(layer_dims: list, hidden_activation='leaky_relu', seed=42)[source]¶
Intelligent initialization selection based on activation function.
Automatically selects the optimal initialization strategy based on the chosen activation function. Combines research-validated best practices to ensure proper gradient flow from the start of training.
- Parameters:
- Returns:
(weights, biases) with optimally initialized weights for the activation.
- Return type:
- Initialization Strategy:
ReLU/Leaky ReLU: He initialization
Tanh/Sigmoid: Xavier initialization
SELU: LeCun initialization
Unknown: Xavier initialization (safe default)
Example
>>> weights, biases = WeightInits.smart_init([784, 128, 10], 'relu')
Utilities¶
Utilities Module Helper functions for training, validation, and data processing.
- class neuroscope.mlp.utils.Utils[source]¶
Bases:
object
Utility functions for neural network training and data processing.
Provides essential helper functions for batch processing, gradient clipping, input validation, and numerical stability checks. All methods are static and can be used independently throughout the framework.
- static get_batches(X, y, batch_size=32, shuffle=True)[source]¶
Generate mini-batches for training.
Creates mini-batches from input data with optional shuffling for stochastic gradient descent training. Handles the last batch even if it contains fewer samples than batch_size.
- Parameters:
- Yields:
tuple[NDArray
,NDArray]
– (X_batch, y_batch) for each mini-batch.
Example
>>> for X_batch, y_batch in Utils.get_batches(X_train, y_train, batch_size=64): ... # Process batch ... pass
- static get_batches_fast(X, y, batch_size=32, shuffle=True)[source]¶
Generate mini-batches for training with optimized memory usage. Expected to be 2-5x faster than get_batches() for large datasets.
- Parameters:
- Yields:
tuple[NDArray
,NDArray]
– (X_batch, y_batch) for each mini-batch.
Note
Uses array views (slicing) instead of fancy indexing to avoid memory allocation. Pre-reshapes y to avoid repeated reshaping in training loop.
Example
>>> # Fast batch preprocessing for fast training >>> for X_batch, y_batch in Utils.get_batches_fast(X_train, y_train, batch_size=64): ... # Process batch ... pass
- static gradient_clipping(gradients, max_norm=5.0)[source]¶
Apply gradient clipping to prevent exploding gradients.
Clips gradients by global norm as described in Pascanu et al. (2013). If the global norm exceeds max_norm, all gradients are scaled down proportionally to maintain their relative magnitudes.
- Parameters:
gradients (
list[NDArray[np.float64]]
) – List of gradient arrays.max_norm (
float
, optional) – Maximum allowed gradient norm. Defaults to 5.0.
- Returns:
Clipped gradient arrays.
- Return type:
list[NDArray[np.float64]]
Note
Based on “On the difficulty of training recurrent neural networks” (Pascanu et al. 2013) for gradient norm clipping.
Example
>>> clipped_grads = Utils.gradient_clipping(gradients, max_norm=5.0)
- static validate_array_input(arr, name, min_dims=1, max_dims=3, fast_mode=False)[source]¶
Optimized validation for neural network operations.
Performs efficient validation with optional fast mode for training. Automatically converts compatible inputs to numpy arrays when possible.
- Parameters:
arr – Input array or array-like object to validate.
name (
str
) – Name of the array for error messages.min_dims (
int
, optional) – Minimum allowed dimensions. Defaults to 1.max_dims (
int
, optional) – Maximum allowed dimensions. Defaults to 3.fast_mode (
bool
, optional) – Skip expensive NaN/inf checks for speed. Defaults to False.
- Returns:
Validated numpy array.
- Return type:
NDArray[np.float64]
- Raises:
TypeError – If input cannot be converted to numpy array.
ValueError – If dimensions, shape, or values are invalid.
Example
>>> X_valid = Utils.validate_array_input(X, "training_data", min_dims=2, max_dims=2) >>> X_fast = Utils.validate_array_input(X, "X_train", fast_mode=True) # For fit_fast()
- static check_numerical_stability(arrays, context='computation', fast_mode=False)[source]¶
Simple numerical stability check with user-friendly warnings.
Provides clear, actionable warnings for common training issues. Fast mode only checks for critical problems for performance.
- Parameters:
- Returns:
List of simple, actionable issue descriptions.
- Return type:
Example
>>> issues = Utils.check_numerical_stability(activations, "forward_pass") >>> if issues: ... print(f"Training Issue: {issues[0]}")
Diagnostics Module¶
Pre-Training Analysis¶
Pre-Training Analysis for NeuroScope MLP Framework Focused pre-training analysis tools for neural network assessment before training.
- class neuroscope.diagnostics.pretraining.PreTrainingAnalyzer(model)[source]¶
Bases:
object
Comprehensive pre-training diagnostic tool for neural networks.
Analyzes model architecture, weight initialization, and data compatibility before training begins. Implements research-validated checks to identify potential training issues early, based on established deep learning principles from Glorot & Bengio (2010), He et al. (2015), and others.
- Parameters:
model – Compiled MLP model instance with initialized weights.
- model¶
Reference to the neural network model.
Example
>>> from neuroscope.diagnostics import PreTrainingAnalyzer >>> model = MLP([784, 128, 10]) >>> model.compile(lr=1e-3) >>> analyzer = PreTrainingAnalyzer(model) >>> results = analyzer.analyze(X_train, y_train)
Initialize analyzer with a compiled model.
- analyze_initial_loss(X: ndarray, y: ndarray) Dict[str, Any] [source]¶
Validate initial loss against theoretical expectations.
Compares the model’s initial loss (before training) with theoretical baselines for different task types. For classification, expects loss near -log(1/num_classes). For regression, compares against variance-based baseline as described in Goodfellow et al. (2016).
- Parameters:
X (
NDArray[np.float64]
) – Input data of shape (N, input_dim).y (
NDArray[np.float64]
) – Target values of shape (N,) or (N, output_dim).
- Returns:
- Analysis results containing:
initial_loss: Computed initial loss value
expected_loss: Theoretical expected loss
ratio: initial_loss / expected_loss
task_type: Detected task type (regression/classification)
status: “PASS”, “WARN”, or “FAIL”
note: Diagnostic message
- Return type:
Example
>>> results = analyzer.analyze_initial_loss(X_train, y_train) >>> print(f"Initial loss check: {results['status']}")
- analyze_weight_init() Dict[str, Any] [source]¶
Validate weight initialization against theoretical optima.
Analyzes weight initialization quality by comparing actual weight standard deviations against theoretically optimal values for different activation functions. Based on He initialization (He et al. 2015) for ReLU variants and Xavier initialization (Glorot & Bengio 2010) for sigmoid/tanh.
- Returns:
- Analysis results containing:
layers: List of per-layer initialization analysis
status: Overall initialization quality (“PASS”, “WARN”, “FAIL”)
note: Summary diagnostic message
- Return type:
Example
>>> results = analyzer.analyze_weight_init() >>> for layer in results['layers']: ... print(f"Layer {layer['layer']}: {layer['status']}")
- analyze_layer_capacity() Dict[str, Any] [source]¶
Analyze information bottlenecks and layer capacity issues.
- analyze_architecture_sanity() Dict[str, Any] [source]¶
Perform comprehensive architecture validation.
Validates network architecture against established deep learning principles and best practices. Checks for common architectural pitfalls such as incompatible activation functions, inappropriate depth, and problematic layer configurations based on research findings.
- Returns:
- Analysis results containing:
issues: List of critical architectural problems
warnings: List of potential concerns
status: Overall architecture quality (“PASS”, “WARN”, “FAIL”)
note: Summary diagnostic message
- Return type:
Note
Based on research from Bengio et al. (2009) on vanishing gradients, modern best practices for deep architectures, and activation function compatibility studies.
Example
>>> results = analyzer.analyze_architecture_sanity() >>> if results['issues']: ... print("Critical issues found:", results['issues'])
- analyze_capacity_data_ratio(X: ndarray, y: ndarray) Dict[str, Any] [source]¶
Analyze parameter count relative to training data size.
Training Monitoring¶
Training Monitors for NeuroScope MLP Framework Real-time monitoring tools for neural network training based on modern deep learning research. Implements comprehensive training diagnostics with emoji-based status indicators.
- class neuroscope.diagnostics.training_monitors.TrainingMonitor(model=None, history_size=50)[source]¶
Bases:
object
Comprehensive real-time training monitoring system for neural networks.
Monitors 10 key training health indicators: - Dead ReLU neurons detection - Vanishing Gradient Problem (VGP) detection - Exploding Gradient Problem (EGP) detection - Weight health analysis - Learning progress - Overfitting detection - Gradient signal-to-noise ratio - Activation saturation detection (tanh/sigmoid) - Training plateau detection - Weight update vs magnitude ratios
Initialize comprehensive training monitor.
Sets up monitoring infrastructure for tracking 10 key training health indicators during neural network training. Uses research-validated thresholds and emoji-based status visualization.
- Parameters:
model – Optional MLP model instance (can be set later).
history_size (
int
, optional) – Number of epochs to keep in rolling history for trend analysis. Defaults to 50.
Example
>>> monitor = TrainingMonitor(history_size=100) >>> results = model.fit(X, y, monitor=monitor)
- __init__(model=None, history_size=50)[source]¶
Initialize comprehensive training monitor.
Sets up monitoring infrastructure for tracking 10 key training health indicators during neural network training. Uses research-validated thresholds and emoji-based status visualization.
- Parameters:
model – Optional MLP model instance (can be set later).
history_size (
int
, optional) – Number of epochs to keep in rolling history for trend analysis. Defaults to 50.
Example
>>> monitor = TrainingMonitor(history_size=100) >>> results = model.fit(X, y, monitor=monitor)
- monitor_relu_dead_neurons(activations: List[ndarray], activation_functions: List[str] | None = None) Tuple[float, str] [source]¶
Monitor for dead ReLU neurons during training.
Detects neurons that have become inactive (always output zero) which indicates the “dying ReLU” problem. Uses activation-function-aware thresholds based on research by Glorot et al. (2011) and He et al. (2015).
Natural sparsity in ReLU networks is expected (~50%), but excessive sparsity (>90%) indicates dead neurons that cannot learn.
- Parameters:
activations (
list[NDArray[np.float64]]
) – Layer activation outputs.activation_functions (
list[str]
, optional) – Activation function names per layer.
- Returns:
- (dead_percentage, status_emoji) where status is:
🟢: Healthy sparsity (<10% dead)
🟡: Moderate concern (10-30% dead)
🔴: Critical issue (>30% dead)
- Return type:
Note
Based on “Deep Sparse Rectifier Neural Networks” (Glorot et al. 2011) and “Delving Deep into Rectifiers” (He et al. 2015).
- monitor_vanishing_gradients(gradients: List[ndarray]) Tuple[float, str] [source]¶
Detect vanishing gradient problem using research-validated metrics.
Monitors gradient flow through the network to detect vanishing gradients based on variance analysis from Glorot & Bengio (2010). Healthy networks maintain similar gradient variance across layers.
- Parameters:
gradients (
list[NDArray[np.float64]]
) – Gradient arrays for each layer.- Returns:
- (vgp_severity, status_emoji) where:
vgp_severity: Float in [0,1] indicating severity
status: 🟢 (healthy), 🟡 (warning), 🔴 (critical)
- Return type:
Note
Implementation based on “Understanding the difficulty of training deep feedforward neural networks” (Glorot & Bengio 2010).
- monitor_exploding_gradients(gradients: List[ndarray]) Tuple[float, str] [source]¶
Detect exploding gradient problem using gradient norm analysis.
Monitors gradient magnitudes to detect exploding gradients based on research by Pascanu et al. (2013). Uses both global norm and per-layer analysis to identify unstable training dynamics.
- Parameters:
gradients (
list[NDArray[np.float64]]
) – Gradient arrays for each layer.- Returns:
- (egp_severity, status_emoji) where:
egp_severity: Float in [0,1] indicating severity
status: 🟢 (stable), 🟡 (elevated), 🔴 (exploding)
- Return type:
Note
Based on “On the difficulty of training recurrent neural networks” (Pascanu et al. 2013) gradient clipping and norm analysis.
- monitor_weight_health(weights: List[ndarray]) Tuple[float, str] [source]¶
Simple, research-backed weight health monitor. Based on Glorot & Bengio (2010) and He et al. (2015) initialization theory. :param weights: List of weight matrices
- Returns:
Tuple of (health_score, status)
- monitor_learning_progress(current_loss: float, val_loss: float | None = None) Tuple[float, str] [source]¶
Research-accurate learning progress monitor. Based on optimization literature: Bottou (2010), Goodfellow et al. (2016), Smith (2017). Key insights: - Progress = consistent loss reduction + convergence stability + generalization health - Uses exponential moving averages and plateau detection from literature :param current_loss: Current training loss :param val_loss: Optional validation loss
- Returns:
Tuple of (progress_score, emoji_status)
- monitor_overfitting(train_loss: float, val_loss: float | None = None) Tuple[float, str] [source]¶
Research-accurate overfitting detection. Based on Prechelt (1998), Goodfellow et al. (2016), and Caruana et al. (2001). Key insights: - Overfitting = increasing generalization gap + validation curve deterioration :param train_loss: Training loss :param val_loss: Validation loss
- Returns:
Tuple of (overfitting_score, emoji_status)
- monitor_gradient_snr(gradients: List[ndarray]) Tuple[float, str] [source]¶
Calculate Gradient Signal-to-Noise Ratio (GSNR) for optimization health. - Signal: RMS gradient magnitude (update strength) - Noise: Coefficient of variation (relative inconsistency) - GSNR = RMS_magnitude / (std_magnitude + ε) This measures gradient update consistency. :param gradients: List of gradient arrays from each layer
- Returns:
Tuple of (gsnr_score, emoji_status)
- monitor_activation_saturation(activations: List[ndarray], activation_functions: List[str] = None) Tuple[float, str] [source]¶
Research-accurate activation saturation detection. Based on Glorot & Bengio (2010), Hochreiter (1991), and He et al. (2015). Key insights: - Saturation = extreme activation values + poor gradient flow + skewed distributions - Uses function-specific thresholds and statistical distribution analysis - Tracks saturation propagation through network layers :param activations: List of activation arrays from each layer :param activation_functions: List of activation function names for each layer
- Returns:
Tuple of (saturation_score, emoji_status)
- monitor_plateau(current_loss: float, val_loss: float | None = None, current_gradients: List[ndarray] | None = None) Tuple[float, str] [source]¶
Research-accurate training plateau detection. Based on Prechelt (1998), Bengio (2012), and Smith (2017). Key insights: - Plateau = statistical stagnation + loss of learning momentum + gradient analysis - Uses multi-scale analysis and statistical significance testing - Integrates validation correlation and gradient magnitude trends :param current_loss: Current training loss :param val_loss: Optional validation loss for correlation analysis :param current_gradients: Optional gradient arrays for gradient-based detection
- Returns:
Tuple of (plateau_score, emoji_status)
- monitor_weight_update_ratio(weights: List[ndarray], weight_updates: List[ndarray]) Tuple[float, str] [source]¶
Monitor Weight Update to Weight magnitude Ratios (WUR) for learning rate validation. Research-based implementation using: - Smith (2015): Learning rate should produce WUR ~10^-3 to 10^-2 for stable training - Zeiler (2012): Update magnitude should be proportional to weight magnitude Formula: WUR = ||weight_update|| / ||weight|| per layer :param weights: Current weight matrices :param weight_updates: Weight update matrices (gradients * learning_rate)
- Returns:
Tuple of (median_wur, status)
- monitor_step(epoch: int, train_loss: float, val_loss: float | None = None, activations: List[ndarray] | None = None, gradients: List[ndarray] | None = None, weights: List[ndarray] | None = None, weight_updates: List[ndarray] | None = None, activation_functions: List[str] | None = None) Dict[str, Any] [source]¶
Perform one monitoring step and return all metrics. :param epoch: Current epoch number :param train_loss: Training loss :param val_loss: Validation loss (optional) :param activations: Layer activations (optional) :param gradients: Layer gradients (optional) :param weights: Layer weights (optional) :param weight_updates: Weight updates (optional) :param activation_functions: List of activation function names (optional)
- Returns:
Dictionary containing all monitoring results
Post-Training Evaluation¶
Post-Training Evaluation for NeuroScope MLP Framework Focused post-training evaluation tools for neural network assessment after training.
- class neuroscope.diagnostics.posttraining.PostTrainingEvaluator(model)[source]¶
Bases:
object
Comprehensive post-training evaluation system for neural networks.
Provides thorough analysis of trained model performance including robustness testing, performance metrics evaluation, and diagnostic assessments. Designed to validate model quality and identify potential deployment issues after training completion.
- Parameters:
model – Trained and compiled MLP model instance with initialized weights.
- model¶
Reference to the trained neural network model.
Example
>>> from neuroscope.diagnostics import PostTrainingEvaluator >>> model = MLP([784, 128, 10]) >>> model.compile(lr=1e-3) >>> history = model.fit(X_train, y_train, epochs=100) >>> evaluator = PostTrainingEvaluator(model) >>> evaluator.evaluate(X_test, y_test) >>> # Access detailed results >>> robustness = evaluator.evaluate_robustness(X_test, y_test) >>> performance = evaluator.evaluate_performance(X_test, y_test)
Initialize evaluator with a trained model.
- evaluate_robustness(X: ndarray, y: ndarray, noise_levels: List[float] = None) Dict[str, Any] [source]¶
Evaluate model robustness against Gaussian noise.
- evaluate_performance(X: ndarray, y: ndarray) Dict[str, Any] [source]¶
Evaluate model performance metrics.
Visualization Module¶
Plotting Tools¶
NeuroScope Visualization Module High-quality plotting tools for neural network training analysis.
- class neuroscope.viz.plots.Visualizer(hist)[source]¶
Bases:
object
High quality visualization tool for neural network training analysis.
Provides comprehensive plotting capabilities for analyzing training dynamics, network behavior, and diagnostic information. Creates professional-grade figures suitable for research publications and presentations.
- Parameters:
hist (
dict
) – Complete training history from model.fit() containing: - history: Training/validation metrics per epoch - weights/biases: Final network parameters - activations/gradients: Sample network internals - *_stats_over_epochs: Statistical evolution during training
- weights/biases
Final network parameters.
- activations/gradients
Representative network internals.
Example
>>> from neuroscope.viz import Visualizer >>> history = model.fit(X_train, y_train, epochs=100) >>> viz = Visualizer(history) >>> viz.plot_learning_curves() >>> viz.plot_activation_distribution() >>> viz.plot_gradient_flow()
Initialize visualizer with comprehensive training history.
Sets up visualization infrastructure and applies publication-quality styling to all plots. Automatically extracts relevant data components for different types of analysis.
- Parameters:
hist (
dict
) – Training history from model.fit() containing all training statistics, network states, and diagnostic information.
- __init__(hist)[source]¶
Initialize visualizer with comprehensive training history.
Sets up visualization infrastructure and applies publication-quality styling to all plots. Automatically extracts relevant data components for different types of analysis.
- Parameters:
hist (
dict
) – Training history from model.fit() containing all training statistics, network states, and diagnostic information.
- plot_learning_curves(figsize=(9, 4), ci=False, markers=True, save_path=None, metric='accuracy')[source]¶
Plot training and validation learning curves for regular fit() results.
Creates highest quality subplot showing loss and metric evolution during training. Automatically detects available data and applies consistent styling with optional confidence intervals.
Note: For fit_fast() results, use plot_curves_fast() instead.
- Parameters:
figsize (
tuple[int
,int]
, optional) – Figure dimensions (width, height). Defaults to (9, 4).ci (
bool
, optional) – Whether to add confidence intervals using moving window statistics. Only available for regular fit() results. Defaults to False.markers (
bool
, optional) – Whether to show markers on line plots. Defaults to True.save_path (
str
, optional) – Path to save the figure. Defaults to None.metric (
str
, optional) – Name of the metric for y-axis label. Defaults to ‘accuracy’.
Example
>>> viz.plot_learning_curves(figsize=(10, 5), ci=True, save_path='curves.png')
- plot_curves_fast(figsize=(10, 4), markers=True, save_path=None)[source]¶
Plot learning curves for fit_fast() results.
- plot_activation_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Plot activation value distributions across network layers.
Visualizes the distribution of activation values for each layer at a specific epoch, aggregated from all mini-batches. Useful for detecting activation saturation, dead neurons, and distribution shifts during training.
- Parameters:
epoch (
int
, optional) – Specific epoch to plot. If None, uses last epoch. Defaults to None.figsize (
tuple[int
,int]
, optional) – Figure dimensions. Defaults to (9, 4).kde (
bool
, optional) – Whether to use KDE-style smoothing for smoother curves. Defaults to False.last_layer (
bool
, optional) – Whether to include output layer. Defaults to False.save_path (
str
, optional) – Path to save the figure. Defaults to None.
Example
>>> viz.plot_activation_hist(epoch=50, kde=True, save_path='activations.png')
- plot_gradient_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Plot gradient value distributions across network layers.
Visualizes gradient distributions to detect vanishing/exploding gradient problems, gradient flow issues, and training stability. Shows zero-line reference for assessing gradient symmetry and magnitude.
- Parameters:
epoch (
int
, optional) – Specific epoch to plot. If None, uses last epoch. Defaults to None.figsize (
tuple[int
,int]
, optional) – Figure dimensions. Defaults to (9, 4).kde (
bool
, optional) – Whether to use KDE-style smoothing. Defaults to False.last_layer (
bool
, optional) – Whether to include output layer gradients. Defaults to False.save_path (
str
, optional) – Path to save the figure. Defaults to None.
Note
Gradient distributions should be roughly symmetric around zero for healthy training. Very narrow distributions may indicate vanishing gradients, while very wide distributions may indicate exploding gradients.
Example
>>> viz.plot_gradient_hist(epoch=25, kde=True, save_path='gradients.png')
- plot_weight_hist(epoch=None, figsize=(9, 4), kde=False, last_layer=False, save_path=None)[source]¶
Uses aggregated samples from all mini-batches within an epoch to create representative distributions. Shows weight evolution patterns.
- Parameters:
epoch – Specific epoch to plot (default: last epoch)
figsize – Figure size tuple
kde – Whether to use KDE-style smoothing
last_layer – Whether to include output layer (default: False, hidden layers only)
save_path – Path to save figure
- plot_activation_stats(activation_stats=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot activation statistics over time with both mean and std.
- Parameters:
activation_stats – Dict of layer activation stats OR None to use class data Format: {‘layer_0’: {‘mean’: […], ‘std’: […]}, …}
figsize – Figure size tuple
save_path – Path to save figure
- plot_gradient_stats(figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot gradient statistics over time with both mean and std.
- plot_weight_stats(figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot weight statistics over time with both mean and std.
- plot_update_ratios(update_ratios=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot weight update ratios per layer across epochs.
- Parameters:
update_ratios – Dict of layer update ratios (optional - uses collected data if None) Format: {‘layer_0’: [ratio_epoch_0, ratio_epoch_1, …], …}
figsize – Figure size tuple
save_path – Path to save figure
- plot_gradient_norms(gradient_norms=None, figsize=(12, 4), save_path=None, reference_lines=False)[source]¶
Plot gradient norms per layer over epochs.
- plot_training_animation(bg='dark', save_path=None)[source]¶
Creates a comprehensive 4-panel GIF animation showing: 1. Loss curves over time 2. Accuracy curves over time 3. Current metrics bar chart 4. Gradient flow analysis Speed automatically adjusts based on epoch count for smooth motion feel. :param bg: Theme (‘dark’ or ‘light’) :param save_path: Path to save GIF (defaults to ‘mlp_training_animation.gif’)
- Returns:
Path to saved GIF file