Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Optimizer API Reference

The Optimizer API provides algorithms for updating trainable parameters based on computed gradients. Optimizers automatically collect parameters from the computation graph and apply update rules during training.

Table of Contents


Data Structures

tofu_optimizer

The optimizer structure that manages parameters and their update strategy.

struct tofu_optimizer {
    tofu_optim_type type;            // Optimizer type
    tofu_graph* graph;               // Associated computation graph

    tofu_graph_node** params;        // Array of parameter nodes
    int num_params;                  // Number of parameters
    int capacity_params;             // Allocated capacity

    double learning_rate;            // Learning rate

    void* state;                     // Optimizer state (momentum buffers, etc.)

    tofu_optim_step_fn step_fn;      // Parameter update function
};

Optimizer Types (tofu_optim_type)

Available optimization algorithms:

  • TOFU_OPTIM_SGD - Vanilla Stochastic Gradient Descent
  • TOFU_OPTIM_SGD_MOMENTUM - SGD with momentum
  • TOFU_OPTIM_ADAM - Adam optimizer (future)

Creating Optimizers

tofu_optimizer_sgd_create

Create SGD (Stochastic Gradient Descent) optimizer.

tofu_optimizer* tofu_optimizer_sgd_create(tofu_graph* g, double learning_rate);

Parameters:

  • g - Computation graph containing parameters (cannot be NULL)
  • learning_rate - Learning rate (step size) (must be > 0)

Returns: Pointer to newly allocated optimizer (caller owns, must call tofu_optimizer_free)

Preconditions:

  • g must not be NULL
  • learning_rate > 0

Behavior:

  • Implements vanilla SGD: param = param - learning_rate * grad
  • Automatically collects all PARAM nodes from graph
  • Caller must call tofu_optimizer_free to free optimizer

Algorithm:

for each parameter θ:
    θ ← θ - η * ∇θL
where:
    η = learning_rate
    ∇θL = gradient of loss w.r.t. parameter

Example:

tofu_graph *g = tofu_graph_create();

// Add parameters to graph
tofu_tensor *W = tofu_tensor_zeros(2, (int[]){784, 10}, TOFU_FLOAT);
tofu_graph_node *W_node = tofu_graph_param(g, W);

// Create optimizer
tofu_optimizer *opt = tofu_optimizer_sgd_create(g, 0.01);

// Training loop
for (int epoch = 0; epoch < num_epochs; epoch++) {
    tofu_optimizer_zero_grad(opt);
    // ... forward and backward pass ...
    tofu_optimizer_step(opt);
}

// Cleanup
tofu_optimizer_free(opt);
tofu_graph_free(g);
tofu_tensor_free_data_too(W);

Notes:

  • Simple and robust, good baseline optimizer
  • No momentum or adaptive learning rates
  • May converge slowly on complex problems
  • Violating preconditions triggers assert() and crashes

See also: tofu_optimizer_sgd_momentum_create for SGD with momentum


tofu_optimizer_sgd_momentum_create

Create SGD optimizer with momentum.

tofu_optimizer* tofu_optimizer_sgd_momentum_create(tofu_graph* g, double learning_rate, double momentum);

Parameters:

  • g - Computation graph containing parameters (cannot be NULL)
  • learning_rate - Learning rate (step size) (must be > 0)
  • momentum - Momentum coefficient (typically 0.9) (must be >= 0 and < 1)

Returns: Pointer to newly allocated optimizer (caller owns, must call tofu_optimizer_free)

Preconditions:

  • g must not be NULL
  • learning_rate > 0
  • 0 <= momentum < 1

Behavior:

  • Implements SGD with momentum:
    • velocity = momentum * velocity - learning_rate * grad
    • param = param + velocity
  • Momentum helps accelerate training and reduces oscillations
  • Automatically collects all PARAM nodes from graph
  • Caller must call tofu_optimizer_free to free optimizer

Algorithm:

for each parameter θ:
    v ← μ * v - η * ∇θL
    θ ← θ + v
where:
    η = learning_rate
    μ = momentum
    v = velocity (accumulated gradients)
    ∇θL = gradient of loss w.r.t. parameter

Note: This is mathematically equivalent to classical momentum
(v = μ*v + ∇θL, θ = θ - η*v) but incorporates the learning
rate into the velocity update rather than the parameter update.

Example:

tofu_graph *g = tofu_graph_create();

// Add parameters
tofu_tensor *W = tofu_tensor_zeros(2, (int[]){784, 10}, TOFU_FLOAT);
tofu_graph_node *W_node = tofu_graph_param(g, W);

// Create optimizer with momentum
tofu_optimizer *opt = tofu_optimizer_sgd_momentum_create(g, 0.01, 0.9);

// Training loop
for (int epoch = 0; epoch < num_epochs; epoch++) {
    tofu_optimizer_zero_grad(opt);
    // ... forward and backward pass ...
    tofu_optimizer_step(opt);
}

// Cleanup
tofu_optimizer_free(opt);

Notes:

  • Momentum helps escape local minima and speeds up convergence
  • Typical momentum values: 0.9 (standard), 0.99 (high momentum)
  • More effective than vanilla SGD for deep networks
  • Violating preconditions triggers assert() and crashes

See also: tofu_optimizer_sgd_create for vanilla SGD


Cleanup

tofu_optimizer_free

Free optimizer and its state.

void tofu_optimizer_free(tofu_optimizer* opt);

Parameters:

  • opt - Optimizer to free (can be NULL, no-op if NULL)

Behavior:

  • Frees optimizer structure and internal state (momentum buffers, etc.)
  • Does NOT free the graph or parameters (graph owns them)
  • Safe to call multiple times (idempotent)

Cleanup Order:

// CORRECT order:
tofu_optimizer_free(opt);           // 1. Free optimizer
tofu_graph_free(g);                 // 2. Free graph
tofu_tensor_free_data_too(weights);  // 3. Free tensors

// INCORRECT order (may crash):
tofu_graph_free(g);                 // DON'T free graph before optimizer!
tofu_optimizer_free(opt);           // Optimizer may access freed memory

Training Operations

tofu_optimizer_step

Perform one optimization step (update parameters).

void tofu_optimizer_step(tofu_optimizer* opt);

Parameters:

  • opt - Optimizer (cannot be NULL)

Preconditions:

  • opt must not be NULL
  • Gradients must be computed (call tofu_graph_backward first)

Behavior:

  • Updates all parameters using computed gradients
  • Algorithm depends on optimizer type (SGD, SGD+momentum, etc.)
  • Call after backward pass: forward → backward → step
  • Does NOT zero gradients - call tofu_optimizer_zero_grad if needed

Training Sequence:

for (int iteration = 0; iteration < num_iterations; iteration++) {
    // 1. Zero gradients
    tofu_optimizer_zero_grad(opt);

    // 2. Forward pass
    tofu_graph_node *x = tofu_graph_input(g, input_data);
    tofu_graph_node *pred = forward_pass(g, x);
    tofu_graph_node *loss = tofu_graph_mse_loss(g, pred, target);

    // 3. Backward pass
    tofu_graph_backward(g, loss);

    // 4. Update parameters
    tofu_optimizer_step(opt);

    // 5. Clear operations for next iteration
    tofu_graph_clear_ops(g);
}

Notes:

  • Must call tofu_graph_backward() before this function
  • Modifies parameter tensors in-place
  • Violating preconditions triggers assert() and crashes

See also: tofu_graph_backward, tofu_optimizer_zero_grad


tofu_optimizer_zero_grad

Zero out all parameter gradients.

void tofu_optimizer_zero_grad(tofu_optimizer* opt);

Parameters:

  • opt - Optimizer (cannot be NULL)

Preconditions:

  • opt must not be NULL

Behavior:

  • Sets gradients to zero for all tracked parameters
  • Call before each training iteration to prevent gradient accumulation
  • Equivalent to tofu_graph_zero_grad but works via optimizer

Example:

for (int epoch = 0; epoch < num_epochs; epoch++) {
    // Zero gradients before forward pass
    tofu_optimizer_zero_grad(opt);

    // Forward pass
    tofu_graph_node *pred = forward_pass(g, input);
    tofu_graph_node *loss = compute_loss(g, pred, target);

    // Backward pass
    tofu_graph_backward(g, loss);

    // Update parameters
    tofu_optimizer_step(opt);
}

Notes:

  • Essential for correct training - prevents gradient accumulation
  • Must call before each training iteration
  • Violating preconditions triggers assert() and crashes

See also: tofu_graph_zero_grad


Parameter Management

Most users won't need these functions - parameters are automatically collected during optimizer creation. These are useful for advanced use cases like dynamic network architectures.

tofu_optimizer_add_param

Manually add parameter node to optimizer.

int tofu_optimizer_add_param(tofu_optimizer* opt, tofu_graph_node* param);

Parameters:

  • opt - Optimizer (cannot be NULL)
  • param - Parameter node to track (cannot be NULL)

Returns: 0 on success, non-zero on error

Preconditions:

  • opt and param must not be NULL
  • param must be a PARAM node (requires gradient)

Behavior:

  • Usually not needed - optimizer auto-collects params at creation
  • Use if you need to add parameters dynamically

Example:

// Create optimizer
tofu_optimizer *opt = tofu_optimizer_sgd_create(g, 0.01);

// Add parameters dynamically (rare use case)
tofu_tensor *new_weight = tofu_tensor_zeros(2, (int[]){10, 5}, TOFU_FLOAT);
tofu_graph_node *W_new = tofu_graph_param(g, new_weight);
tofu_optimizer_add_param(opt, W_new);

Notes:

  • Rarely needed - use only for dynamic architectures
  • Violating preconditions triggers assert() and crashes

See also: tofu_optimizer_collect_params to scan graph for all params


tofu_optimizer_collect_params

Collect all parameter nodes from graph.

void tofu_optimizer_collect_params(tofu_optimizer* opt);

Parameters:

  • opt - Optimizer (cannot be NULL)

Preconditions:

  • opt must not be NULL

Behavior:

  • Scans graph and adds all PARAM nodes to optimizer
  • Called automatically during optimizer creation
  • Use if graph structure changes and you need to rescan
  • Clears existing parameter list before collecting

Example:

tofu_optimizer *opt = tofu_optimizer_sgd_create(g, 0.01);

// Add more parameters to graph later
tofu_tensor *W2 = tofu_tensor_zeros(2, (int[]){10, 5}, TOFU_FLOAT);
tofu_graph_node *W2_node = tofu_graph_param(g, W2);

// Rescan graph to include new parameters
tofu_optimizer_collect_params(opt);

Notes:

  • Rarely needed - parameters auto-collected at creation
  • Use only if network structure changes dynamically
  • Violating preconditions triggers assert() and crashes

Usage Patterns

Basic Training Loop

// Setup
tofu_graph *g = tofu_graph_create();

// Create parameters
tofu_tensor *W = tofu_tensor_zeros(2, (int[]){784, 10}, TOFU_FLOAT);
tofu_tensor *b = tofu_tensor_zeros(1, (int[]){10}, TOFU_FLOAT);

// Add to graph
tofu_graph_node *W_node = tofu_graph_param(g, W);
tofu_graph_node *b_node = tofu_graph_param(g, b);

// Create optimizer
tofu_optimizer *opt = tofu_optimizer_sgd_create(g, 0.01);

// Training loop
for (int epoch = 0; epoch < num_epochs; epoch++) {
    for (int batch = 0; batch < num_batches; batch++) {
        // 1. Zero gradients
        tofu_optimizer_zero_grad(opt);

        // 2. Forward pass
        tofu_graph_node *x = tofu_graph_input(g, batch_data[batch]);
        tofu_graph_node *h = tofu_graph_matmul(g, x, W_node);
        tofu_graph_node *pred = tofu_graph_add(g, h, b_node);

        // 3. Compute loss
        tofu_graph_node *target = tofu_graph_input(g, batch_targets[batch]);
        tofu_graph_node *loss = tofu_graph_mse_loss(g, pred, target);

        // 4. Backward pass
        tofu_graph_backward(g, loss);

        // 5. Update parameters
        tofu_optimizer_step(opt);

        // 6. Clear operations for next batch
        tofu_graph_clear_ops(g);
    }
}

// Cleanup
tofu_optimizer_free(opt);
tofu_graph_free(g);
tofu_tensor_free_data_too(W);
tofu_tensor_free_data_too(b);

Training with Momentum

// Setup with momentum optimizer
tofu_graph *g = tofu_graph_create();

// Network parameters
tofu_tensor *W1 = tofu_tensor_zeros(2, (int[]){784, 128}, TOFU_FLOAT);
tofu_tensor *b1 = tofu_tensor_zeros(1, (int[]){128}, TOFU_FLOAT);
tofu_tensor *W2 = tofu_tensor_zeros(2, (int[]){128, 10}, TOFU_FLOAT);
tofu_tensor *b2 = tofu_tensor_zeros(1, (int[]){10}, TOFU_FLOAT);

// Add to graph
tofu_graph_node *W1_node = tofu_graph_param(g, W1);
tofu_graph_node *b1_node = tofu_graph_param(g, b1);
tofu_graph_node *W2_node = tofu_graph_param(g, W2);
tofu_graph_node *b2_node = tofu_graph_param(g, b2);

// Create optimizer with momentum
tofu_optimizer *opt = tofu_optimizer_sgd_momentum_create(g, 0.01, 0.9);

// Training loop
for (int epoch = 0; epoch < num_epochs; epoch++) {
    for (int batch = 0; batch < num_batches; batch++) {
        tofu_optimizer_zero_grad(opt);

        // Forward pass
        tofu_graph_node *x = tofu_graph_input(g, batch_data[batch]);

        // Layer 1
        tofu_graph_node *h1 = tofu_graph_matmul(g, x, W1_node);
        h1 = tofu_graph_add(g, h1, b1_node);
        h1 = tofu_graph_relu(g, h1);

        // Layer 2
        tofu_graph_node *h2 = tofu_graph_matmul(g, h1, W2_node);
        h2 = tofu_graph_add(g, h2, b2_node);

        // Loss
        tofu_graph_node *target = tofu_graph_input(g, batch_targets[batch]);
        tofu_graph_node *loss = tofu_graph_mse_loss(g, h2, target);

        // Backward and update
        tofu_graph_backward(g, loss);
        tofu_optimizer_step(opt);

        tofu_graph_clear_ops(g);
    }
}

// Cleanup
tofu_optimizer_free(opt);
tofu_graph_free(g);
tofu_tensor_free_data_too(W1);
tofu_tensor_free_data_too(b1);
tofu_tensor_free_data_too(W2);
tofu_tensor_free_data_too(b2);

Learning Rate Scheduling

Manual learning rate adjustment during training:

// Create optimizer
tofu_optimizer *opt = tofu_optimizer_sgd_create(g, 0.1);

for (int epoch = 0; epoch < num_epochs; epoch++) {
    // Reduce learning rate every 10 epochs
    if (epoch % 10 == 0 && epoch > 0) {
        opt->learning_rate *= 0.5;
        printf("Epoch %d: Reduced learning rate to %.6f\n", epoch, opt->learning_rate);
    }

    // Training loop for this epoch
    for (int batch = 0; batch < num_batches; batch++) {
        tofu_optimizer_zero_grad(opt);
        // ... forward, backward, step ...
    }
}

Monitoring Gradients

Useful for debugging and understanding training dynamics:

// After backward pass, before optimizer step
tofu_tensor *W_grad = tofu_graph_get_grad(W_node);

// Compute gradient statistics
double grad_sum = 0.0;
double grad_max = -INFINITY;
for (int i = 0; i < W_grad->len; i++) {
    float val;
    TOFU_TENSOR_DATA_TO(W_grad, i, val, TOFU_FLOAT);
    grad_sum += fabs(val);
    if (fabs(val) > grad_max) grad_max = fabs(val);
}

printf("Gradient mean: %.6f, max: %.6f\n",
       grad_sum / W_grad->len, grad_max);

// Now update parameters
tofu_optimizer_step(opt);

Gradient Clipping (Manual)

Prevent exploding gradients:

void clip_gradients(tofu_optimizer *opt, double max_norm) {
    for (int i = 0; i < opt->num_params; i++) {
        tofu_tensor *grad = tofu_graph_get_grad(opt->params[i]);
        if (!grad) continue;

        // Compute gradient norm
        double norm = 0.0;
        for (int j = 0; j < grad->len; j++) {
            float val;
            TOFU_TENSOR_DATA_TO(grad, j, val, TOFU_FLOAT);
            norm += val * val;
        }
        norm = sqrt(norm);

        // Clip if necessary
        if (norm > max_norm) {
            double scale = max_norm / norm;
            for (int j = 0; j < grad->len; j++) {
                float val;
                TOFU_TENSOR_DATA_TO(grad, j, val, TOFU_FLOAT);
                val *= scale;
                TOFU_TENSOR_DATA_FROM(grad, j, val, TOFU_FLOAT);
            }
        }
    }
}

// Usage in training loop
tofu_graph_backward(g, loss);
clip_gradients(opt, 1.0);  // Clip to max norm of 1.0
tofu_optimizer_step(opt);

Hyperparameter Guidance

Learning Rate

The learning rate is the most important hyperparameter. It controls the step size of parameter updates.

Guidelines:

Problem TypeRecommended RangeNotes
Small networks0.01 - 0.1Can use larger learning rates
Deep networks0.001 - 0.01Need smaller learning rates
Fine-tuning0.0001 - 0.001Very small to preserve learned features

Common values:

  • 0.1 - Starting point for small networks
  • 0.01 - Default safe choice for most problems
  • 0.001 - Deep networks, complex problems
  • 0.0001 - Fine-tuning pre-trained models

Signs of incorrect learning rate:

  • Too high: Loss diverges (increases), NaN values, training unstable
  • Too low: Very slow convergence, loss decreases too slowly

Example - Finding good learning rate:

// Try multiple learning rates
double learning_rates[] = {0.001, 0.01, 0.1};

for (int lr_idx = 0; lr_idx < 3; lr_idx++) {
    printf("\n=== Testing LR: %.4f ===\n", learning_rates[lr_idx]);

    // Reset parameters
    reinitialize_parameters(W, b);

    // Create optimizer with this learning rate
    tofu_optimizer *opt = tofu_optimizer_sgd_create(g, learning_rates[lr_idx]);

    // Train for a few epochs
    for (int epoch = 0; epoch < 10; epoch++) {
        // ... training loop ...
        printf("Epoch %d, Loss: %.6f\n", epoch, loss_value);
    }

    tofu_optimizer_free(opt);
}

Momentum

Momentum helps accelerate convergence and dampen oscillations.

Guidelines:

ScenarioRecommended ValueEffect
Default0.9Good balance for most problems
High momentum0.95 - 0.99Faster convergence, may overshoot
Low momentum0.5 - 0.8More stable, slower convergence
No momentum0.0Vanilla SGD, most stable but slowest

Common values:

  • 0.9 - Standard choice for most problems
  • 0.95 - Deep networks, when convergence is slow
  • 0.99 - Very deep networks (ResNet, Transformers)
  • 0.5 - Noisy gradients, unstable training

Example:

// Standard momentum for deep network
tofu_optimizer *opt = tofu_optimizer_sgd_momentum_create(g, 0.01, 0.9);

// Higher momentum for very deep network
tofu_optimizer *opt = tofu_optimizer_sgd_momentum_create(g, 0.01, 0.99);

// Low momentum for noisy gradients
tofu_optimizer *opt = tofu_optimizer_sgd_momentum_create(g, 0.01, 0.5);

Batch Size Considerations

Batch size affects effective learning rate:

// Larger batches → more stable gradients → can use higher learning rate
int batch_size = 128;
double lr = 0.01;

tofu_optimizer *opt = tofu_optimizer_sgd_create(g, lr);

// If you increase batch size, consider increasing learning rate proportionally
// batch_size = 256 → lr = 0.02
// batch_size = 512 → lr = 0.04

Learning Rate Schedules

Common strategies for adjusting learning rate during training:

Step Decay:

// Reduce learning rate every N epochs
if (epoch % 30 == 0 && epoch > 0) {
    opt->learning_rate *= 0.1;  // Reduce by 10x
}

Exponential Decay:

// Decay gradually every epoch
double initial_lr = 0.1;
double decay_rate = 0.96;
opt->learning_rate = initial_lr * pow(decay_rate, epoch);

Cosine Annealing:

// Smooth decay following cosine curve
double initial_lr = 0.1;
double min_lr = 0.001;
opt->learning_rate = min_lr + (initial_lr - min_lr) *
                     (1 + cos(M_PI * epoch / num_epochs)) / 2;

Training Tips

1. Start with a reasonable learning rate:

// Good defaults:
tofu_optimizer *opt_sgd = tofu_optimizer_sgd_create(g, 0.01);
tofu_optimizer *opt_momentum = tofu_optimizer_sgd_momentum_create(g, 0.01, 0.9);

2. Monitor loss and adjust:

double prev_loss = INFINITY;

for (int epoch = 0; epoch < num_epochs; epoch++) {
    // ... training ...

    // Check if loss is improving
    if (loss_value > prev_loss * 1.1) {
        printf("Loss increased! Consider reducing learning rate.\n");
    }

    prev_loss = loss_value;
}

3. Use learning rate warmup for large learning rates:

double target_lr = 0.1;
int warmup_epochs = 5;

for (int epoch = 0; epoch < num_epochs; epoch++) {
    if (epoch < warmup_epochs) {
        // Gradually increase learning rate
        opt->learning_rate = target_lr * (epoch + 1) / warmup_epochs;
    } else {
        opt->learning_rate = target_lr;
    }

    // ... training ...
}

4. Weight decay (L2 regularization) - manual implementation:

double weight_decay = 0.0001;

void apply_weight_decay(tofu_optimizer *opt, double weight_decay) {
    for (int i = 0; i < opt->num_params; i++) {
        tofu_tensor *param = tofu_graph_get_value(opt->params[i]);

        for (int j = 0; j < param->len; j++) {
            float val;
            TOFU_TENSOR_DATA_TO(param, j, val, TOFU_FLOAT);
            val *= (1.0 - weight_decay * opt->learning_rate);
            TOFU_TENSOR_DATA_FROM(param, j, val, TOFU_FLOAT);
        }
    }
}

// Use before optimizer step
tofu_graph_backward(g, loss);
apply_weight_decay(opt, 0.0001);
tofu_optimizer_step(opt);

Common Pitfalls

Forgetting to Zero Gradients

Problem:

// WRONG: Gradients accumulate indefinitely
for (int i = 0; i < num_iterations; i++) {
    // forward, backward, step...
    // Gradients keep accumulating!
}

Solution:

// CORRECT: Zero gradients each iteration
for (int i = 0; i < num_iterations; i++) {
    tofu_optimizer_zero_grad(opt);  // Clear gradients
    // forward, backward, step...
}

Incorrect Cleanup Order

Problem:

// WRONG: Freeing graph before optimizer
tofu_graph_free(g);        // Graph freed
tofu_optimizer_free(opt);  // Optimizer tries to access freed graph!

Solution:

// CORRECT: Free optimizer before graph
tofu_optimizer_free(opt);  // Free optimizer first
tofu_graph_free(g);        // Then free graph

Learning Rate Too High

Symptoms:

  • Loss becomes NaN
  • Loss diverges (increases)
  • Training unstable

Solution:

// Reduce learning rate by 10x
double new_lr = opt->learning_rate * 0.1;
tofu_optimizer_free(opt);
opt = tofu_optimizer_sgd_create(g, new_lr);

Learning Rate Too Low

Symptoms:

  • Loss decreases very slowly
  • Training takes many epochs
  • No progress after many iterations

Solution:

// Increase learning rate by 10x
double new_lr = opt->learning_rate * 10.0;
tofu_optimizer_free(opt);
opt = tofu_optimizer_sgd_create(g, new_lr);

Notes

Optimizer State Persistence

Optimizer state (like momentum buffers) persists across training iterations:

// Momentum accumulates across iterations
tofu_optimizer *opt = tofu_optimizer_sgd_momentum_create(g, 0.01, 0.9);

for (int epoch = 0; epoch < num_epochs; epoch++) {
    // Momentum from previous epochs affects current updates
    // ... training ...
}

Parameter Collection

Optimizers automatically collect parameters when created:

// All PARAM nodes are collected automatically
tofu_graph_node *W1 = tofu_graph_param(g, weights1);
tofu_graph_node *W2 = tofu_graph_param(g, weights2);

tofu_optimizer *opt = tofu_optimizer_sgd_create(g, 0.01);
// opt now tracks both W1 and W2

Memory Management

Optimizer owns its internal state but not the graph or parameters:

// Optimizer allocates momentum buffers (if using momentum)
tofu_optimizer *opt = tofu_optimizer_sgd_momentum_create(g, 0.01, 0.9);

// When freed, optimizer releases momentum buffers
tofu_optimizer_free(opt);

// Graph and parameters remain valid
// (must be freed separately)