Graph API Reference
The Graph API provides computational graph construction and automatic differentiation for training neural networks. It implements reverse-mode automatic differentiation (backpropagation) for computing gradients.
Table of Contents
- Data Structures
- Graph Lifecycle
- Leaf Nodes
- Operations
- Loss Functions
- Backward Pass
- Utilities
- Usage Patterns
Data Structures
tofu_graph
The computation graph structure that manages all nodes and their relationships.
struct tofu_graph {
tofu_graph_node** nodes; // All nodes in graph
int num_nodes; // Number of nodes
int capacity; // Allocated capacity
tofu_graph_node** topo_order; // Nodes in reverse topological order
int topo_size; // Size of topo_order
int topo_capacity; // Allocated capacity
int next_id; // Next available node ID
};
tofu_graph_node
A node in the computation graph representing an operation or leaf value.
struct tofu_graph_node {
int id; // Unique node ID within graph
tofu_op_type op; // Operation type
tofu_tensor* value; // Forward pass result
tofu_tensor* grad; // Gradient (∂L/∂value)
tofu_graph_node** inputs; // Input nodes
int num_inputs; // Number of inputs
int capacity_inputs; // Allocated capacity for inputs
tofu_backward_fn backward_fn; // Backward pass function
void* backward_ctx; // Context for backward (saved tensors, etc.)
int requires_grad; // Does this need gradient computation?
int visited; // For topological sort
tofu_graph* graph; // Parent graph
};
Operation Types (tofu_op_type)
Enumeration of all supported operations:
-
Leaf nodes:
TOFU_OP_INPUT- Input node (no gradient)TOFU_OP_PARAM- Trainable parameter (requires gradient)
-
Binary operations:
TOFU_OP_MATMUL- Matrix multiplicationTOFU_OP_ADD- Element-wise additionTOFU_OP_MUL- Element-wise multiplication
-
Activations:
TOFU_OP_RELU- ReLU activationTOFU_OP_SOFTMAX- Softmax activationTOFU_OP_LAYER_NORM- Layer normalization
-
Shape operations:
TOFU_OP_RESHAPE- Reshape operationTOFU_OP_TRANSPOSE- Transpose operation
-
Reductions:
TOFU_OP_MEAN- Mean reductionTOFU_OP_SUM- Sum reduction
-
Loss functions:
TOFU_OP_MSE_LOSS- Mean squared error lossTOFU_OP_CE_LOSS- Cross-entropy loss
Graph Lifecycle
tofu_graph_create
Create a new empty computation graph.
tofu_graph* tofu_graph_create(void);
Returns: Pointer to newly allocated graph (caller owns, must call tofu_graph_free)
Behavior:
- Graph starts empty - add nodes via
tofu_graph_input,tofu_graph_param, etc. - Graph does NOT take ownership of tensors passed to
tofu_graph_param - Caller must call
tofu_graph_freeto free graph and all nodes
Example:
tofu_graph *g = tofu_graph_create();
// Build graph...
tofu_graph_free(g);
tofu_graph_free
Free computation graph and all nodes.
void tofu_graph_free(tofu_graph* g);
Parameters:
g- Graph to free (can be NULL, no-op if NULL)
Behavior:
- Frees all graph nodes and their gradients
- Frees intermediate operation results (matmul, add, etc.)
- Does NOT free INPUT or PARAM tensors (caller owns them)
- Caller must separately free tensors passed to input/param functions
- Safe to call multiple times (idempotent)
Ownership Pattern:
// Create tensors
tofu_tensor *input = tofu_tensor_zeros(2, (int[]){1, 4}, TOFU_FLOAT);
tofu_tensor *weights = tofu_tensor_zeros(2, (int[]){4, 3}, TOFU_FLOAT);
// Build graph
tofu_graph *g = tofu_graph_create();
tofu_graph_node *x = tofu_graph_input(g, input);
tofu_graph_node *W = tofu_graph_param(g, weights);
// ... more operations ...
// Cleanup (important order!)
tofu_graph_free(g); // 1. Free graph first
tofu_tensor_free_data_too(input); // 2. Then free tensors
tofu_tensor_free_data_too(weights);
See also: tofu_graph_clear_ops
tofu_graph_clear_ops
Clear all operation nodes but keep parameter nodes.
void tofu_graph_clear_ops(tofu_graph* g);
Parameters:
g- Graph to clear (cannot be NULL)
Behavior:
- Frees all nodes except PARAM and INPUT nodes
- Preserves trainable parameters for next forward pass
- Use between training iterations to reset computation graph
Use Case:
tofu_graph *g = tofu_graph_create();
// Add parameters (preserved across iterations)
tofu_graph_node *W = tofu_graph_param(g, weights);
tofu_graph_node *b = tofu_graph_param(g, bias);
for (int epoch = 0; epoch < num_epochs; epoch++) {
// Build forward graph for this batch
tofu_graph_node *x = tofu_graph_input(g, batch_data);
tofu_graph_node *y = tofu_graph_matmul(g, x, W);
tofu_graph_node *out = tofu_graph_add(g, y, b);
// Backward and optimize...
// Clear operations for next iteration (W and b are preserved)
tofu_graph_clear_ops(g);
}
Notes:
- More efficient than creating new graph each iteration
- Parameters maintain their values and gradients
- Violating preconditions triggers
assert()and crashes
Leaf Nodes
Leaf nodes are the starting points of computation - they have no inputs and represent either data or learnable parameters.
tofu_graph_input
Create input node (non-trainable data source).
tofu_graph_node* tofu_graph_input(tofu_graph* g, tofu_tensor* data);
Parameters:
g- Graph to add node to (cannot be NULL)data- Input tensor data (cannot be NULL)
Returns: Pointer to newly created graph node (graph owns node, caller owns tensor)
Behavior:
- Input nodes do NOT compute gradients
- IMPORTANT: Graph does NOT take ownership of data tensor
- Caller must free data tensor separately after
tofu_graph_free() - Use for input data that doesn't require backpropagation
Example:
float input_data[] = {1.0f, 2.0f, 3.0f, 4.0f};
tofu_tensor *x_tensor = tofu_tensor_create(input_data, 2, (int[]){1, 4}, TOFU_FLOAT);
tofu_graph *g = tofu_graph_create();
tofu_graph_node *x = tofu_graph_input(g, x_tensor);
// Use x in graph operations...
tofu_graph_free(g);
tofu_tensor_free(x_tensor); // Caller must free tensor
Notes:
- Typical pattern: create tensor → input node → use → graph_free → free tensor
- Violating preconditions triggers
assert()and crashes
See also: tofu_graph_param for trainable parameters
tofu_graph_param
Create parameter node (trainable weights/biases).
tofu_graph_node* tofu_graph_param(tofu_graph* g, tofu_tensor* data);
Parameters:
g- Graph to add node to (cannot be NULL)data- Parameter tensor data (cannot be NULL)
Returns: Pointer to newly created graph node (graph owns node, caller owns tensor)
Behavior:
- IMPORTANT: Graph does NOT take ownership of data tensor
- Caller must free data tensor separately after
tofu_graph_free() - Parameter nodes compute gradients during backward pass
- Use for trainable weights, biases, etc.
Example:
// Create trainable weights
tofu_tensor *W = tofu_tensor_zeros(2, (int[]){4, 3}, TOFU_FLOAT);
tofu_tensor *b = tofu_tensor_zeros(1, (int[]){3}, TOFU_FLOAT);
tofu_graph *g = tofu_graph_create();
tofu_graph_node *W_node = tofu_graph_param(g, W);
tofu_graph_node *b_node = tofu_graph_param(g, b);
// Build network...
// Training loop with backward pass computes W->grad and b->grad
// Cleanup
tofu_graph_free(g);
tofu_tensor_free_data_too(W); // Caller must free tensors
tofu_tensor_free_data_too(b);
Notes:
- Typical pattern: create tensor → param node → free tensor after graph_free
- Gradients are stored in the node, accessible via
tofu_graph_get_grad() - Violating preconditions triggers
assert()and crashes
See also: tofu_graph_input for non-trainable inputs, tofu_graph_get_grad to access gradients
Operations
Operations create new nodes in the graph that compute values during forward pass and gradients during backward pass.
tofu_graph_matmul
Add matrix multiplication node to graph.
tofu_graph_node* tofu_graph_matmul(tofu_graph* g, tofu_graph_node* a, tofu_graph_node* b);
Parameters:
g- Graph to add node to (cannot be NULL)a- Left operand node (cannot be NULL)b- Right operand node (cannot be NULL)
Returns: Pointer to result node (graph owns, freed by tofu_graph_free)
Preconditions:
a->value->dims[last]must equalb->value->dims[second-to-last]
Behavior:
- Computes matrix multiplication with broadcasting
- Implements backward pass for gradient computation
- Result node requires gradient if any input requires gradient
Example:
// Neural network layer: y = x @ W
tofu_graph_node *x = tofu_graph_input(g, input_tensor);
tofu_graph_node *W = tofu_graph_param(g, weights_tensor);
tofu_graph_node *y = tofu_graph_matmul(g, x, W);
Notes:
- Most commonly used operation for neural networks
- Follows same semantics as
tofu_tensor_matmul(see Tensor API) - Violating preconditions triggers
assert()and crashes
tofu_graph_add
Add element-wise addition node to graph.
tofu_graph_node* tofu_graph_add(tofu_graph* g, tofu_graph_node* a, tofu_graph_node* b);
Parameters:
g- Graph to add node to (cannot be NULL)a- First operand node (cannot be NULL)b- Second operand node (cannot be NULL)
Returns: Pointer to result node (graph owns, freed by tofu_graph_free)
Preconditions:
- a and b must be broadcastable (NumPy rules)
Behavior:
- Computes element-wise addition with broadcasting
- Implements backward pass for gradient computation
Example:
// Add bias: y = x + b
tofu_graph_node *x = tofu_graph_matmul(g, input, weights);
tofu_graph_node *b = tofu_graph_param(g, bias_tensor);
tofu_graph_node *y = tofu_graph_add(g, x, b);
Notes:
- Supports NumPy-style broadcasting
- Common for adding biases to layer outputs
tofu_graph_mul
Add element-wise multiplication node to graph.
tofu_graph_node* tofu_graph_mul(tofu_graph* g, tofu_graph_node* a, tofu_graph_node* b);
Parameters:
g- Graph to add node to (cannot be NULL)a- First operand node (cannot be NULL)b- Second operand node (cannot be NULL)
Returns: Pointer to result node (graph owns, freed by tofu_graph_free)
Preconditions:
- a and b must be broadcastable (NumPy rules)
Behavior:
- Computes element-wise multiplication with broadcasting
- Implements backward pass for gradient computation
Example:
// Attention mechanism: scaled dot product
tofu_graph_node *qk = tofu_graph_matmul(g, q, k);
tofu_graph_node *scale = tofu_graph_param(g, scale_tensor);
tofu_graph_node *scaled = tofu_graph_mul(g, qk, scale);
tofu_graph_relu
Add ReLU activation node to graph.
tofu_graph_node* tofu_graph_relu(tofu_graph* g, tofu_graph_node* x);
Parameters:
g- Graph to add node to (cannot be NULL)x- Input node (cannot be NULL)
Returns: Pointer to result node (graph owns, freed by tofu_graph_free)
Behavior:
- Computes ReLU:
max(0, x) - Implements backward pass for gradient computation
- Gradient is 1 where
x > 0, else 0
Example:
// Hidden layer with ReLU
tofu_graph_node *h1 = tofu_graph_matmul(g, x, W1);
tofu_graph_node *h1_bias = tofu_graph_add(g, h1, b1);
tofu_graph_node *h1_relu = tofu_graph_relu(g, h1_bias);
tofu_graph_softmax
Add softmax activation node to graph.
tofu_graph_node* tofu_graph_softmax(tofu_graph* g, tofu_graph_node* x, int axis);
Parameters:
g- Graph to add node to (cannot be NULL)x- Input node (cannot be NULL)axis- Axis along which to apply softmax
Returns: Pointer to result node (graph owns, freed by tofu_graph_free)
Preconditions:
axis < x->value->ndim
Behavior:
- Computes softmax along specified axis (exp normalization)
- Implements backward pass for gradient computation
- Numerically stable (subtracts max before exp)
Example:
// Classification output layer
tofu_graph_node *logits = tofu_graph_matmul(g, h, W_out);
tofu_graph_node *probs = tofu_graph_softmax(g, logits, 1);
Notes:
- Typically used for classification tasks
- Output values sum to 1.0 along specified axis
tofu_graph_layer_norm
Add layer normalization node to graph.
tofu_graph_node* tofu_graph_layer_norm(tofu_graph* g, tofu_graph_node* x,
tofu_graph_node* gamma, tofu_graph_node* beta,
int axis, double eps);
Parameters:
g- Graph to add node to (cannot be NULL)x- Input node (cannot be NULL)gamma- Scale parameter node (can be NULL for no scaling)beta- Shift parameter node (can be NULL for no shift)axis- Axis along which to normalizeeps- Small constant for numerical stability (typically 1e-5)
Returns: Pointer to result node (graph owns, freed by tofu_graph_free)
Preconditions:
axis < x->value->ndimeps > 0
Behavior:
- Normalizes:
(x - mean) / sqrt(variance + eps) - Then applies:
gamma * normalized + beta(if gamma/beta non-NULL) - Implements backward pass for gradient computation
Example:
// Transformer-style layer norm
tofu_graph_node *gamma = tofu_graph_param(g, gamma_tensor);
tofu_graph_node *beta = tofu_graph_param(g, beta_tensor);
tofu_graph_node *normalized = tofu_graph_layer_norm(g, x, gamma, beta, 1, 1e-5);
Notes:
- Common in transformer architectures
- Helps stabilize training of deep networks
tofu_graph_reshape
Add reshape node to graph.
tofu_graph_node* tofu_graph_reshape(tofu_graph* g, tofu_graph_node* x, int ndim, const int* dims);
Parameters:
g- Graph to add node to (cannot be NULL)x- Input node (cannot be NULL)ndim- Number of dimensions for reshaped tensordims- Array of new dimension sizes
Returns: Pointer to result node (graph owns, freed by tofu_graph_free)
Preconditions:
- Product of dims must equal
x->valuetotal elements
Behavior:
- View operation (no data copy) - reshaped tensor shares data with input
- Implements backward pass for gradient computation
Example:
// Flatten for fully connected layer
// Input: [batch, channels, height, width]
// Output: [batch, channels * height * width]
int flat_dim = channels * height * width;
tofu_graph_node *flat = tofu_graph_reshape(g, x, 2, (int[]){batch_size, flat_dim});
tofu_graph_transpose
Add transpose node to graph.
tofu_graph_node* tofu_graph_transpose(tofu_graph* g, tofu_graph_node* x, const int* axes);
Parameters:
g- Graph to add node to (cannot be NULL)x- Input node (cannot be NULL)axes- Permutation array (can be NULL for reverse order)
Returns: Pointer to result node (graph owns, freed by tofu_graph_free)
Preconditions:
- If axes is non-NULL, it must be valid permutation of
[0, ..., ndim-1]
Behavior:
- If axes is NULL, reverses dimension order
- Implements backward pass for gradient computation
Example:
// Transpose matrix for weight matrix
tofu_graph_node *W_T = tofu_graph_transpose(g, W, NULL);
Loss Functions
Loss functions compute scalar values representing model error. They're typically the final nodes in a computation graph before calling tofu_graph_backward().
tofu_graph_mse_loss
Add mean squared error loss node to graph.
tofu_graph_node* tofu_graph_mse_loss(tofu_graph* g, tofu_graph_node* pred, tofu_graph_node* target);
Parameters:
g- Graph to add node to (cannot be NULL)pred- Prediction node (cannot be NULL)target- Target/ground truth node (cannot be NULL)
Returns: Pointer to scalar loss node (graph owns, freed by tofu_graph_free)
Preconditions:
- pred and target must have same shape
Behavior:
- Computes:
mean((pred - target)^2) - Returns scalar (average over all elements)
- Use for regression tasks
- Implements backward pass for gradient computation
Example:
// Regression task
tofu_graph_node *pred = tofu_graph_matmul(g, x, W);
tofu_graph_node *target = tofu_graph_input(g, target_tensor);
tofu_graph_node *loss = tofu_graph_mse_loss(g, pred, target);
// Compute gradients
tofu_graph_backward(g, loss);
See also: tofu_graph_ce_loss for classification
tofu_graph_ce_loss
Add cross-entropy loss node to graph.
tofu_graph_node* tofu_graph_ce_loss(tofu_graph* g, tofu_graph_node* pred, tofu_graph_node* target);
Parameters:
g- Graph to add node to (cannot be NULL)pred- Prediction node (softmax probabilities) (cannot be NULL)target- Target/ground truth node (class indices or one-hot) (cannot be NULL)
Returns: Pointer to scalar loss node (graph owns, freed by tofu_graph_free)
Behavior:
- Computes:
-sum(target * log(pred)) - Returns scalar (average over batch)
- Use for classification tasks
- Numerically stable implementation
- Implements backward pass for gradient computation
Example:
// Classification task
tofu_graph_node *logits = tofu_graph_matmul(g, x, W);
tofu_graph_node *probs = tofu_graph_softmax(g, logits, 1);
tofu_graph_node *target = tofu_graph_input(g, target_tensor);
tofu_graph_node *loss = tofu_graph_ce_loss(g, probs, target);
// Compute gradients
tofu_graph_backward(g, loss);
See also: tofu_graph_mse_loss for regression
Backward Pass
tofu_graph_backward
Perform backward pass (backpropagation) from loss node.
void tofu_graph_backward(tofu_graph* g, tofu_graph_node* loss);
Parameters:
g- Graph containing loss node (cannot be NULL)loss- Loss node to backpropagate from (cannot be NULL)
Preconditions:
- loss must be scalar (single element tensor)
Behavior:
- Computes gradients for all nodes requiring gradient
- Populates
node->gradfor all PARAM nodes - Uses reverse-mode automatic differentiation
- Call after forward pass, before optimizer step
- Gradients accumulate across multiple backward passes and from multiple computational paths
- Always call
tofu_graph_zero_gradbefore each training iteration unless you intentionally want gradient accumulation (e.g., for gradient accumulation across mini-batches)
Example:
// Training iteration
tofu_graph *g = tofu_graph_create();
// Forward pass
tofu_graph_node *x = tofu_graph_input(g, input_data);
tofu_graph_node *W = tofu_graph_param(g, weights);
tofu_graph_node *pred = tofu_graph_matmul(g, x, W);
tofu_graph_node *target = tofu_graph_input(g, target_data);
tofu_graph_node *loss = tofu_graph_mse_loss(g, pred, target);
// Backward pass
tofu_graph_backward(g, loss);
// Now W->grad contains ∂loss/∂W
tofu_tensor *W_grad = tofu_graph_get_grad(W);
Notes:
- Automatically builds topological sort for efficient gradient computation
- Violating preconditions triggers
assert()and crashes
See also: tofu_graph_zero_grad to clear gradients, tofu_graph_get_grad to access gradients
Utilities
tofu_graph_get_value
Get forward pass result from graph node.
tofu_tensor* tofu_graph_get_value(tofu_graph_node* node);
Parameters:
node- Graph node (cannot be NULL)
Returns: Pointer to result tensor (node owns, do NOT free)
Behavior:
- Returns tensor computed during forward pass
- Do NOT free returned tensor (node owns it)
Example:
tofu_graph_node *pred = tofu_graph_matmul(g, x, W);
tofu_tensor *pred_value = tofu_graph_get_value(pred);
// Print predictions
tofu_tensor_print(pred_value, "%.6f");
Warning: Do not free the returned tensor!
tofu_graph_get_grad
Get gradient from graph node.
tofu_tensor* tofu_graph_get_grad(tofu_graph_node* node);
Parameters:
node- Graph node (cannot be NULL)
Returns: Pointer to gradient tensor (node owns, do NOT free), or NULL if no gradient
Behavior:
- Returns gradient computed during backward pass
- Returns NULL if backward hasn't been called yet
- Do NOT free returned tensor (node owns it)
Example:
tofu_graph_node *W = tofu_graph_param(g, weights);
// ... build graph and backward pass ...
tofu_tensor *W_grad = tofu_graph_get_grad(W);
if (W_grad) {
// Use gradient for parameter update
// (or use optimizer which handles this automatically)
}
Warning: Do not free the returned tensor!
tofu_graph_zero_grad
Zero out all gradients in graph.
void tofu_graph_zero_grad(tofu_graph* g);
Parameters:
g- Graph to zero gradients for (cannot be NULL)
Behavior:
- Sets all
node->gradtensors to zero - Call before each training iteration to prevent gradient accumulation
- Does NOT free gradient tensors, just zeros values
Example:
for (int epoch = 0; epoch < num_epochs; epoch++) {
// Zero gradients before forward pass
tofu_graph_zero_grad(g);
// Forward pass
// ... build graph ...
// Backward pass
tofu_graph_backward(g, loss);
// Update parameters
tofu_optimizer_step(optimizer);
}
Notes:
- Essential for correct training - prevents gradient accumulation
- Typically called by optimizer's
zero_grad()function
See also: tofu_optimizer_zero_grad
Usage Patterns
Basic Training Loop
// Setup
tofu_graph *g = tofu_graph_create();
tofu_tensor *W = tofu_tensor_zeros(2, (int[]){4, 3}, TOFU_FLOAT);
tofu_tensor *b = tofu_tensor_zeros(1, (int[]){3}, TOFU_FLOAT);
tofu_graph_node *W_node = tofu_graph_param(g, W);
tofu_graph_node *b_node = tofu_graph_param(g, b);
tofu_optimizer *opt = tofu_optimizer_sgd_create(g, 0.01);
// Training loop
for (int epoch = 0; epoch < num_epochs; epoch++) {
for (int batch = 0; batch < num_batches; batch++) {
// Zero gradients
tofu_optimizer_zero_grad(opt);
// Forward pass
tofu_graph_node *x = tofu_graph_input(g, batch_data[batch]);
tofu_graph_node *y = tofu_graph_matmul(g, x, W_node);
tofu_graph_node *out = tofu_graph_add(g, y, b_node);
tofu_graph_node *target = tofu_graph_input(g, batch_targets[batch]);
tofu_graph_node *loss = tofu_graph_mse_loss(g, out, target);
// Backward pass
tofu_graph_backward(g, loss);
// Update parameters
tofu_optimizer_step(opt);
// Clear operations for next batch (keeps parameters)
tofu_graph_clear_ops(g);
}
}
// Cleanup
tofu_optimizer_free(opt);
tofu_graph_free(g);
tofu_tensor_free_data_too(W);
tofu_tensor_free_data_too(b);
Multi-Layer Neural Network
// Define network architecture
typedef struct {
tofu_tensor *W1, *b1;
tofu_tensor *W2, *b2;
tofu_tensor *W3, *b3;
} Network;
// Forward pass function
tofu_graph_node* forward_pass(tofu_graph *g, tofu_graph_node *x, Network *net) {
// Layer 1
tofu_graph_node *W1 = tofu_graph_param(g, net->W1);
tofu_graph_node *b1 = tofu_graph_param(g, net->b1);
tofu_graph_node *h1 = tofu_graph_matmul(g, x, W1);
h1 = tofu_graph_add(g, h1, b1);
h1 = tofu_graph_relu(g, h1);
// Layer 2
tofu_graph_node *W2 = tofu_graph_param(g, net->W2);
tofu_graph_node *b2 = tofu_graph_param(g, net->b2);
tofu_graph_node *h2 = tofu_graph_matmul(g, h1, W2);
h2 = tofu_graph_add(g, h2, b2);
h2 = tofu_graph_relu(g, h2);
// Output layer
tofu_graph_node *W3 = tofu_graph_param(g, net->W3);
tofu_graph_node *b3 = tofu_graph_param(g, net->b3);
tofu_graph_node *out = tofu_graph_matmul(g, h2, W3);
out = tofu_graph_add(g, out, b3);
return out;
}
// Usage
tofu_graph *g = tofu_graph_create();
tofu_graph_node *x = tofu_graph_input(g, input_data);
tofu_graph_node *pred = forward_pass(g, x, &network);
tofu_graph_node *target = tofu_graph_input(g, target_data);
tofu_graph_node *loss = tofu_graph_mse_loss(g, pred, target);
tofu_graph_backward(g, loss);
Classification with Softmax and Cross-Entropy
// Classification network
tofu_graph_node *x = tofu_graph_input(g, input_data);
tofu_graph_node *W = tofu_graph_param(g, weights);
tofu_graph_node *b = tofu_graph_param(g, bias);
// Logits
tofu_graph_node *logits = tofu_graph_matmul(g, x, W);
logits = tofu_graph_add(g, logits, b);
// Softmax (probabilities)
tofu_graph_node *probs = tofu_graph_softmax(g, logits, 1);
// Cross-entropy loss
tofu_graph_node *target = tofu_graph_input(g, target_data);
tofu_graph_node *loss = tofu_graph_ce_loss(g, probs, target);
// Backward and optimize
tofu_graph_backward(g, loss);
tofu_optimizer_step(optimizer);
Memory Management Best Practices
// 1. Create tensors for parameters (library manages data)
tofu_tensor *weights = tofu_tensor_zeros(2, (int[]){784, 10}, TOFU_FLOAT);
tofu_tensor *bias = tofu_tensor_zeros(1, (int[]){10}, TOFU_FLOAT);
// 2. Create graph and add parameters
tofu_graph *g = tofu_graph_create();
tofu_graph_node *W = tofu_graph_param(g, weights);
tofu_graph_node *b = tofu_graph_param(g, bias);
// 3. Training loop
for (int epoch = 0; epoch < num_epochs; epoch++) {
// Create input tensors (user manages data)
float *batch_data = load_batch(epoch);
tofu_tensor *x_tensor = tofu_tensor_create(batch_data, 2, (int[]){32, 784}, TOFU_FLOAT);
// Build graph
tofu_graph_node *x = tofu_graph_input(g, x_tensor);
// ... forward pass ...
// Training step
tofu_graph_backward(g, loss);
tofu_optimizer_step(opt);
// Clean up batch resources
tofu_tensor_free(x_tensor); // Free tensor structure
free(batch_data); // Free data buffer
// Clear operations (keeps parameters)
tofu_graph_clear_ops(g);
}
// 4. Cleanup (IMPORTANT ORDER!)
tofu_optimizer_free(opt); // Free optimizer first
tofu_graph_free(g); // Then free graph
tofu_tensor_free_data_too(weights); // Finally free parameter tensors
tofu_tensor_free_data_too(bias);
Efficient Batch Processing
tofu_graph *g = tofu_graph_create();
// Add parameters once (persists across batches)
tofu_graph_node *W = tofu_graph_param(g, weights);
tofu_graph_node *b = tofu_graph_param(g, bias);
for (int batch = 0; batch < num_batches; batch++) {
tofu_optimizer_zero_grad(opt);
// Add input for this batch
tofu_graph_node *x = tofu_graph_input(g, batch_data[batch]);
// Build forward graph
tofu_graph_node *out = tofu_graph_add(g, tofu_graph_matmul(g, x, W), b);
tofu_graph_node *loss = tofu_graph_mse_loss(g, out, batch_targets[batch]);
// Backward and update
tofu_graph_backward(g, loss);
tofu_optimizer_step(opt);
// Clear operations for next batch (W and b are preserved)
tofu_graph_clear_ops(g);
}
Notes
Gradient Accumulation
Gradients accumulate by default. Always call tofu_graph_zero_grad() or tofu_optimizer_zero_grad() before each training iteration:
// CORRECT: Zero gradients before each iteration
for (int i = 0; i < num_iterations; i++) {
tofu_optimizer_zero_grad(opt); // Clear previous gradients
// ... forward and backward ...
}
// INCORRECT: Gradients accumulate indefinitely
for (int i = 0; i < num_iterations; i++) {
// ... forward and backward ...
// Gradients from all iterations accumulate!
}
Dynamic Graphs
Tofu uses dynamic computation graphs (define-by-run). The graph structure can change between iterations:
for (int epoch = 0; epoch < num_epochs; epoch++) {
if (epoch < 10) {
// Simple network
out = tofu_graph_matmul(g, x, W1);
} else {
// More complex network
h = tofu_graph_relu(g, tofu_graph_matmul(g, x, W1));
out = tofu_graph_matmul(g, h, W2);
}
tofu_graph_backward(g, loss);
tofu_graph_clear_ops(g); // Clear for next iteration
}
Error Checking
Most functions use assert() for precondition checking. In release builds with assertions disabled, violating preconditions leads to undefined behavior. Always ensure:
- Pointers are non-NULL
- Shapes are compatible
- Tensors are broadcastable
- Loss is scalar before calling backward