Tensor API Reference

The Tensor API provides the core data structures and operations for Tofu. Tensors are multi-dimensional arrays that support automatic differentiation when used with the Graph API.

Data Structures
Creation Functions
Shape Operations
Mathematical Operations
Element-wise Operations
Reductions
Activation Functions
Utilities

Data Structures

`tofu_tensor`

The core tensor structure representing a multi-dimensional array.

struct tofu_tensor {
    tofu_dtype dtype;           // Data type (TOFU_FLOAT, TOFU_INT32, etc.)
    int len;                    // Total number of elements
    int ndim;                   // Number of dimensions
    int *dims;                  // Array of dimension sizes
    void *data;                 // Pointer to data buffer
    struct tofu_tensor *owner;  // Data owner (NULL if self-owned)
    void *backend_data;         // Backend-specific data
};

Data Types (`tofu_dtype`)

Supported tensor data types:

TOFU_FLOAT - 32-bit floating point (most common for neural networks)
TOFU_DOUBLE - 64-bit floating point
TOFU_INT32 - 32-bit signed integer
TOFU_INT64 - 64-bit signed integer
TOFU_INT16 - 16-bit signed integer
TOFU_INT8 - 8-bit signed integer
TOFU_UINT32 - 32-bit unsigned integer
TOFU_UINT64 - 64-bit unsigned integer
TOFU_UINT16 - 16-bit unsigned integer
TOFU_UINT8 - 8-bit unsigned integer
TOFU_BOOL - Boolean type

Element-wise Operations (`tofu_elew_op`)

TOFU_MUL - Multiplication (*)
TOFU_DIV - Division (/)
TOFU_SUM - Addition (+)
TOFU_SUB - Subtraction (-)
TOFU_MAX - Element-wise maximum
TOFU_MIN - Element-wise minimum
TOFU_POW - Power (^)

Creation Functions

`tofu_tensor_create`

Create a tensor with an existing data buffer.

tofu_tensor *tofu_tensor_create(void *data, int ndim, const int *dims, tofu_dtype dtype);

Parameters:

data - Pointer to data buffer (cannot be NULL)
ndim - Number of dimensions (must be > 0 and <= TOFU_MAXDIM = 8)
dims - Array of dimension sizes, length must be ndim
dtype - Data type (TOFU_FLOAT, TOFU_INT32, etc.)

Returns: Pointer to newly allocated tensor (caller owns, must call tofu_tensor_free)

Ownership:

The tensor does NOT take ownership of the data buffer
Caller must manage data lifetime and free both tensor and data separately
Even when passed to tofu_graph_param(), caller still owns tensor

Example:

float data[] = {1.0f, 2.0f, 3.0f, 4.0f};
int dims[] = {2, 2};
tofu_tensor *t = tofu_tensor_create(data, 2, dims, TOFU_FLOAT);

// Use tensor...

tofu_tensor_free(t);  // Free tensor structure
// data is still valid - free manually if needed

Notes:

Typical pattern: create tensor → use in graph → tofu_graph_free() → tofu_tensor_free() → free data
Violating preconditions triggers assert() and crashes

See also: tofu_tensor_zeros, tofu_tensor_create_with_values

`tofu_tensor_create_with_values`

Create a tensor with heap-allocated copy of provided values.

tofu_tensor *tofu_tensor_create_with_values(const float *values, int ndim, const int *dims);

Parameters:

values - Array of initial values (cannot be NULL)
ndim - Number of dimensions (must be > 0 and <= TOFU_MAXDIM)
dims - Array of dimension sizes, length must be ndim

Returns: Pointer to newly allocated tensor with copied data (caller owns, must call tofu_tensor_free_data_too)

Important:

Creates heap-allocated copy of values (safe for gradients)
DO NOT use compound literals like (float[]){1.0f} as they create stack memory
Number of values must match product of dims
Caller must call tofu_tensor_free_data_too to free both tensor and data

Example:

float values[] = {1.0f, 2.0f, 3.0f, 4.0f};
int dims[] = {2, 2};
tofu_tensor *t = tofu_tensor_create_with_values(values, 2, dims);

// Use tensor...

tofu_tensor_free_data_too(t);  // Free both tensor and data

`tofu_tensor_zeros`

Create a zero-initialized tensor with allocated data buffer.

tofu_tensor *tofu_tensor_zeros(int ndim, const int *dims, tofu_dtype dtype);

Parameters:

ndim - Number of dimensions (must be > 0 and <= TOFU_MAXDIM)
dims - Array of dimension sizes, length must be ndim
dtype - Data type (TOFU_FLOAT, TOFU_INT32, etc.)

Returns: Pointer to newly allocated zero-filled tensor (caller owns, must call tofu_tensor_free_data_too)

Ownership:

Allocates both tensor structure and data buffer
Caller must call tofu_tensor_free_data_too to free both
Even when passed to tofu_graph_param(), caller still owns tensor

Example:

int dims[] = {3, 4};
tofu_tensor *t = tofu_tensor_zeros(2, dims, TOFU_FLOAT);

// All elements are 0.0f
// Use tensor...

tofu_tensor_free_data_too(t);  // Free both tensor and data

See also: tofu_tensor_create, tofu_tensor_clone

`tofu_tensor_clone`

Create a deep copy of a tensor.

tofu_tensor *tofu_tensor_clone(const tofu_tensor *src);

Parameters:

src - Source tensor to clone (cannot be NULL)

Returns: Pointer to newly allocated tensor (caller owns, must call tofu_tensor_free_data_too)

Behavior:

Creates both new tensor structure and new data buffer
Copies all data from source to new tensor
Preserves shape and data type

Example:

tofu_tensor *original = tofu_tensor_zeros(2, (int[]){2, 3}, TOFU_FLOAT);
tofu_tensor *copy = tofu_tensor_clone(original);

// copy is independent of original
tofu_tensor_free_data_too(copy);
tofu_tensor_free_data_too(original);

`tofu_tensor_repeat`

Create a tensor by repeating data multiple times.

tofu_tensor *tofu_tensor_repeat(const tofu_tensor *src, int times);

Parameters:

src - Source tensor to repeat (cannot be NULL)
times - Number of repetitions (must be > 0)

Returns: Pointer to newly allocated tensor (caller owns, must call tofu_tensor_free_data_too)

Behavior:

Creates new tensor with size = src->len * times
Repeats source data sequentially

Example:

float data[] = {1.0f, 2.0f};
tofu_tensor *t = tofu_tensor_create(data, 1, (int[]){2}, TOFU_FLOAT);
tofu_tensor *repeated = tofu_tensor_repeat(t, 3);
// repeated contains: [1.0, 2.0, 1.0, 2.0, 1.0, 2.0]

tofu_tensor_free_data_too(repeated);
tofu_tensor_free(t);

`tofu_tensor_arange`

Create a 1-D tensor with evenly spaced values (similar to NumPy arange).

tofu_tensor *tofu_tensor_arange(double start, double stop, double step, tofu_dtype dtype);

Parameters:

start - Starting value (inclusive)
stop - Ending value (exclusive)
step - Step size between values
dtype - Data type for the resulting tensor

Returns: Pointer to newly allocated 1-D tensor (caller owns, must call tofu_tensor_free_data_too)

Behavior:

Creates values [start, start+step, start+2*step, ..., stop)
Number of elements = ceil((stop - start) / step)

Example:

tofu_tensor *t = tofu_tensor_arange(0.0, 5.0, 1.0, TOFU_FLOAT);
// t contains: [0.0, 1.0, 2.0, 3.0, 4.0]

tofu_tensor_free_data_too(t);

See also: tofu_tensor_rearange for in-place filling

`tofu_tensor_rearange`

Fill existing tensor with evenly spaced values (in-place arange).

void tofu_tensor_rearange(tofu_tensor *src, double start, double stop, double step);

Parameters:

src - Tensor to fill (cannot be NULL)
start - Starting value (inclusive)
stop - Ending value (exclusive)
step - Step size between values

Behavior:

Fills tensor with [start, start+step, start+2*step, ...]
Number of values written is min(tensor size, ceil((stop-start)/step))
Modifies tensor data in-place

Cleanup Functions

`tofu_tensor_free`

Free tensor structure (does NOT free data buffer).

void tofu_tensor_free(tofu_tensor *t);

Parameters:

t - Tensor to free (can be NULL, no-op if NULL)

Behavior:

Frees only the tensor structure and dims array
Does NOT free the data buffer - caller must free data separately
Safe to call even if tensor was used with tofu_graph_param()
Call AFTER tofu_graph_free() if tensor was used with graph

Example:

float data[4] = {1.0f, 2.0f, 3.0f, 4.0f};
tofu_tensor *t = tofu_tensor_create(data, 1, (int[]){4}, TOFU_FLOAT);

tofu_tensor_free(t);  // Free tensor structure only
// data is still valid

See also: tofu_tensor_free_data_too

`tofu_tensor_free_data_too`

Free both tensor structure and data buffer.

void tofu_tensor_free_data_too(tofu_tensor *t);

Parameters:

t - Tensor to free (can be NULL, no-op if NULL)

Behavior:

Frees both the tensor and its associated data buffer
Only use if tensor owns its data (created with tofu_tensor_zeros, tofu_tensor_clone, etc.)
Do NOT use if tensor was created with tofu_tensor_create() (use tofu_tensor_free)
Safe to call if tensor was used with tofu_graph_param()
Call AFTER tofu_graph_free() if tensor was used with graph

Example:

tofu_tensor *t = tofu_tensor_zeros(2, (int[]){2, 3}, TOFU_FLOAT);

tofu_tensor_free_data_too(t);  // Free both tensor and data

Warning: Using this on tensors created with tofu_tensor_create() will cause undefined behavior!

Shape Operations

`tofu_tensor_size`

Get total number of elements in tensor.

size_t tofu_tensor_size(tofu_tensor *t);

Parameters:

t - Tensor (cannot be NULL)

Returns: Total element count (product of all dimensions)

Example:

tofu_tensor *t = tofu_tensor_zeros(2, (int[]){3, 4}, TOFU_FLOAT);
size_t size = tofu_tensor_size(t);  // Returns 12

`tofu_tensor_reshape`

Reshape tensor to new dimensions (view operation, no data copy).

tofu_tensor *tofu_tensor_reshape(tofu_tensor *src, int ndim, const int *dims);

Parameters:

src - Source tensor (cannot be NULL)
ndim - Number of dimensions for reshaped tensor
dims - Array of new dimension sizes

Returns: New tensor structure sharing data with source (caller owns, must call tofu_tensor_free)

Behavior:

Does NOT copy data - result shares memory with source
Only changes shape metadata, not data layout
Source must outlive result tensor
Product of dims must equal tofu_tensor_size(src)

Warning: Do NOT call tofu_tensor_free_data_too on the reshaped view - this would free the shared data while the source tensor still references it! Only use tofu_tensor_free on views.

Example:

tofu_tensor *t = tofu_tensor_zeros(1, (int[]){12}, TOFU_FLOAT);
tofu_tensor *reshaped = tofu_tensor_reshape(t, 2, (int[]){3, 4});

// reshaped is a view of t with shape [3, 4]
tofu_tensor_free(reshaped);  // Free view
tofu_tensor_free_data_too(t);  // Free original

See also: tofu_tensor_reshape_src for in-place reshape

`tofu_tensor_reshape_src`

Reshape tensor in-place (modifies source tensor metadata).

void tofu_tensor_reshape_src(tofu_tensor *src, int ndim, const int *dims);

Parameters:

src - Tensor to reshape (cannot be NULL)
ndim - Number of dimensions for reshaped tensor
dims - Array of new dimension sizes

Behavior:

Modifies src tensor structure in-place
Does NOT copy or reallocate data
Only changes shape metadata
Product of dims must equal tofu_tensor_size(src)

Example:

tofu_tensor *t = tofu_tensor_zeros(1, (int[]){12}, TOFU_FLOAT);
tofu_tensor_reshape_src(t, 2, (int[]){3, 4});
// t now has shape [3, 4]

`tofu_tensor_transpose`

Transpose tensor by permuting dimensions.

tofu_tensor *tofu_tensor_transpose(const tofu_tensor *src, tofu_tensor *dst, const int *axes);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
axes - Permutation array (can be NULL for reverse order)

Returns: Result tensor (caller owns if dst was NULL)

Behavior:

If axes is NULL, reverses dimension order (e.g., [2,3,4] → [4,3,2])
If axes is non-NULL, permutes according to axes (e.g., axes=[1,0] swaps dims)
For 2-D matrix, axes=NULL transposes (rows ↔ columns)

Example:

// Matrix transpose
tofu_tensor *matrix = tofu_tensor_zeros(2, (int[]){3, 4}, TOFU_FLOAT);
tofu_tensor *transposed = tofu_tensor_transpose(matrix, NULL, NULL);
// transposed has shape [4, 3]

// Custom permutation
int axes[] = {2, 0, 1};
tofu_tensor *t3d = tofu_tensor_zeros(3, (int[]){2, 3, 4}, TOFU_FLOAT);
tofu_tensor *permuted = tofu_tensor_transpose(t3d, NULL, axes);
// permuted has shape [4, 2, 3]

`tofu_tensor_slice`

Extract slice from tensor (copies data).

tofu_tensor *tofu_tensor_slice(const tofu_tensor *src, tofu_tensor *dst,
                               int axis, int start, int len);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
axis - Axis along which to slice
start - Starting index along axis
len - Length of slice

Returns: Result tensor (caller owns if dst was NULL)

Preconditions:

axis < src->ndim
start >= 0 and start + len <= src->dims[axis]
If dst is non-NULL, it must have correct shape for slice

Example:

tofu_tensor *t = tofu_tensor_arange(0.0, 10.0, 1.0, TOFU_FLOAT);
tofu_tensor *slice = tofu_tensor_slice(t, NULL, 0, 2, 5);
// slice contains: [2.0, 3.0, 4.0, 5.0, 6.0]

See also: tofu_tensor_slice_nocopy for view without copying

`tofu_tensor_slice_nocopy`

Create view of tensor slice (no data copy).

tofu_tensor *tofu_tensor_slice_nocopy(tofu_tensor *src, tofu_tensor *dst,
                                      int axis, int start, int len);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
axis - Axis along which to slice
start - Starting index along axis
len - Length of slice

Returns: Result tensor sharing data with source (caller owns if dst was NULL)

Behavior:

Does NOT copy data - result shares memory with source
Modifying result will modify source tensor
Source must outlive result tensor

Warning: This is a view operation - changes affect the original tensor!

`tofu_tensor_concat`

Concatenate two tensors along specified axis.

tofu_tensor *tofu_tensor_concat(const tofu_tensor *src1, const tofu_tensor *src2,
                                tofu_tensor *dst, int axis);

Parameters:

src1 - First tensor (cannot be NULL)
src2 - Second tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
axis - Axis along which to concatenate

Returns: Result tensor (caller owns if dst was NULL)

Preconditions:

All dimensions except axis must match between src1 and src2

Behavior:

Result dims[axis] = src1->dims[axis] + src2->dims[axis]

Example:

tofu_tensor *a = tofu_tensor_zeros(2, (int[]){2, 3}, TOFU_FLOAT);
tofu_tensor *b = tofu_tensor_zeros(2, (int[]){2, 3}, TOFU_FLOAT);
tofu_tensor *concat = tofu_tensor_concat(a, b, NULL, 0);
// concat has shape [4, 3]

Mathematical Operations

`tofu_tensor_matmul`

Compute matrix multiplication with broadcasting.

tofu_tensor *tofu_tensor_matmul(const tofu_tensor *src1, const tofu_tensor *src2,
                                tofu_tensor *dst);

Parameters:

src1 - Left operand tensor (cannot be NULL)
src2 - Right operand tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)

Returns: Result tensor (caller owns if dst was NULL)

Preconditions:

For 1-D @ 1-D: src1->dims[0] must equal src2->dims[0]
For 2-D and higher: src1->dims[src1->ndim-1] must equal src2->dims[src2->ndim-2]

Behavior:

1-D @ 1-D: Dot product → scalar
2-D @ 2-D: Standard matrix multiplication
N-D @ 1-D: Matrix-vector (drops last dim)
1-D @ N-D: Vector-matrix (drops first dim)
N-D @ N-D: Batch matmul with broadcasting

Example:

// Matrix multiplication
tofu_tensor *A = tofu_tensor_zeros(2, (int[]){3, 4}, TOFU_FLOAT);
tofu_tensor *B = tofu_tensor_zeros(2, (int[]){4, 5}, TOFU_FLOAT);
tofu_tensor *C = tofu_tensor_matmul(A, B, NULL);
// C has shape [3, 5]

// Batch matrix multiplication
tofu_tensor *batch_A = tofu_tensor_zeros(3, (int[]){2, 3, 4}, TOFU_FLOAT);
tofu_tensor *batch_B = tofu_tensor_zeros(3, (int[]){2, 4, 5}, TOFU_FLOAT);
tofu_tensor *batch_C = tofu_tensor_matmul(batch_A, batch_B, NULL);
// batch_C has shape [2, 3, 5]

Notes:

Most commonly used operation for neural networks
Broadcasts batch dimensions automatically

See also: tofu_tensor_inner for inner product

`tofu_tensor_inner`

Compute inner product (sum-product over last axes).

tofu_tensor *tofu_tensor_inner(const tofu_tensor *src1, const tofu_tensor *src2,
                               tofu_tensor *dst);

Parameters:

src1 - First tensor (cannot be NULL)
src2 - Second tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)

Returns: Result tensor (caller owns if dst was NULL)

Preconditions:

src1->dims[src1->ndim-1] must equal src2->dims[src2->ndim-1]

Behavior:

1-D × 1-D: Dot product → scalar
2-D × 2-D: result[i,j] = sum(a[i,:] * b[j,:])
N-D × N-D: Cartesian product of non-last dimensions
Output shape: (*a.shape[:-1], *b.shape[:-1])

Example:

tofu_tensor *a = tofu_tensor_arange(0.0, 3.0, 1.0, TOFU_FLOAT);  // [0, 1, 2]
tofu_tensor *b = tofu_tensor_arange(1.0, 4.0, 1.0, TOFU_FLOAT);  // [1, 2, 3]
tofu_tensor *result = tofu_tensor_inner(a, b, NULL);
// result = 0*1 + 1*2 + 2*3 = 8.0

See also: tofu_tensor_matmul, tofu_tensor_outer

`tofu_tensor_outer`

Compute outer product (cartesian product without summation).

tofu_tensor *tofu_tensor_outer(const tofu_tensor *src1, const tofu_tensor *src2,
                               tofu_tensor *dst);

Parameters:

src1 - First tensor (cannot be NULL)
src2 - Second tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)

Returns: Result tensor (caller owns if dst was NULL)

Behavior:

Flattens both input tensors
Computes: result[i,j] = a[i] * b[j]
Always produces 2-D output
Output shape: [a.size, b.size] where size is total element count

Example:

tofu_tensor *a = tofu_tensor_arange(0.0, 3.0, 1.0, TOFU_FLOAT);  // [0, 1, 2]
tofu_tensor *b = tofu_tensor_arange(1.0, 3.0, 1.0, TOFU_FLOAT);  // [1, 2]
tofu_tensor *result = tofu_tensor_outer(a, b, NULL);
// result shape [3, 2]:
// [[0, 0],
//  [1, 2],
//  [2, 4]]

Element-wise Operations

`tofu_tensor_elew`

Apply element-wise binary operation with broadcasting.

tofu_tensor *tofu_tensor_elew(const tofu_tensor *src1, const tofu_tensor *src2,
                              tofu_tensor *dst, tofu_elew_op elew_op);

Parameters:

src1 - First tensor (cannot be NULL)
src2 - Second tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
elew_op - Operation to apply (TOFU_MUL, TOFU_DIV, TOFU_SUM, TOFU_SUB, TOFU_POW, etc.)

Returns: Result tensor (caller owns if dst was NULL)

Preconditions:

src1 and src2 must be broadcastable (NumPy rules)

Operations:

TOFU_MUL - Element-wise multiplication (*)
TOFU_DIV - Element-wise division (/)
TOFU_SUM - Element-wise addition (+)
TOFU_SUB - Element-wise subtraction (-)
TOFU_POW - Element-wise power (^)
TOFU_MAX - Element-wise maximum
TOFU_MIN - Element-wise minimum

Example:

tofu_tensor *a = tofu_tensor_arange(1.0, 5.0, 1.0, TOFU_FLOAT);  // [1, 2, 3, 4]
tofu_tensor *b = tofu_tensor_arange(2.0, 6.0, 1.0, TOFU_FLOAT);  // [2, 3, 4, 5]

tofu_tensor *sum = tofu_tensor_elew(a, b, NULL, TOFU_SUM);
// sum = [3, 5, 7, 9]

tofu_tensor *prod = tofu_tensor_elew(a, b, NULL, TOFU_MUL);
// prod = [2, 6, 12, 20]

// Broadcasting example
tofu_tensor *matrix = tofu_tensor_zeros(2, (int[]){3, 4}, TOFU_FLOAT);
float scalar_data[] = {2.0f};
tofu_tensor *scalar = tofu_tensor_create(scalar_data, 1, (int[]){1}, TOFU_FLOAT);
tofu_tensor *scaled = tofu_tensor_elew(matrix, scalar, NULL, TOFU_MUL);
// All elements of matrix multiplied by 2.0

See also: tofu_tensor_elew_param, tofu_tensor_elew_broadcast

`tofu_tensor_elew_param`

Apply element-wise operation between tensor and scalar.

tofu_tensor *tofu_tensor_elew_param(const tofu_tensor *src, double param,
                                    tofu_tensor *dst, tofu_elew_op elew_op);

Parameters:

src - Source tensor (cannot be NULL)
param - Scalar parameter
dst - Destination tensor (can be NULL to allocate new)
elew_op - Operation to apply

Returns: Result tensor with same shape as src (caller owns if dst was NULL)

Behavior:

Applies operation element-wise: op(tensor_element, param)

Example:

tofu_tensor *t = tofu_tensor_arange(1.0, 5.0, 1.0, TOFU_FLOAT);  // [1, 2, 3, 4]

tofu_tensor *scaled = tofu_tensor_elew_param(t, 2.0, NULL, TOFU_MUL);
// scaled = [2, 4, 6, 8]

tofu_tensor *shifted = tofu_tensor_elew_param(t, 10.0, NULL, TOFU_SUM);
// shifted = [11, 12, 13, 14]

tofu_tensor *squared = tofu_tensor_elew_param(t, 2.0, NULL, TOFU_POW);
// squared = [1, 4, 9, 16]

`tofu_tensor_elew_broadcast`

Apply element-wise operation with automatic broadcasting.

tofu_tensor *tofu_tensor_elew_broadcast(const tofu_tensor *src1, const tofu_tensor *src2,
                                        tofu_tensor *dst, tofu_elew_op elew_op);

Parameters:

src1 - First tensor (cannot be NULL)
src2 - Second tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
elew_op - Operation to apply

Returns: Result tensor with broadcast shape (caller owns if dst was NULL)

Notes:

Automatically broadcasts inputs to compatible shape
Equivalent to tofu_tensor_elew but with explicit broadcast handling
Follows NumPy broadcasting rules

Reductions

`tofu_tensor_sumreduce`

Reduce tensor along axis using sum operation.

tofu_tensor *tofu_tensor_sumreduce(const tofu_tensor *src, tofu_tensor *dst, int axis);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
axis - Axis along which to reduce

Returns: Result tensor with dims[axis] removed (caller owns if dst was NULL)

Behavior:

Output shape: src->dims with dims[axis] removed
Computes sum of all elements along specified axis

Example:

tofu_tensor *t = tofu_tensor_zeros(2, (int[]){3, 4}, TOFU_FLOAT);
// Fill with 1.0
for (int i = 0; i < 12; i++) {
    float val = 1.0f;
    TOFU_TENSOR_DATA_FROM(t, i, val, TOFU_FLOAT);
}

tofu_tensor *row_sum = tofu_tensor_sumreduce(t, NULL, 1);
// row_sum has shape [3], each element = 4.0

tofu_tensor *col_sum = tofu_tensor_sumreduce(t, NULL, 0);
// col_sum has shape [4], each element = 3.0

See also: tofu_tensor_meanreduce, tofu_tensor_maxreduce

`tofu_tensor_meanreduce`

Reduce tensor along axis using mean operation.

tofu_tensor *tofu_tensor_meanreduce(const tofu_tensor *src, tofu_tensor *dst, int axis);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
axis - Axis along which to reduce

Returns: Result tensor with dims[axis] removed (caller owns if dst was NULL)

Behavior:

Output shape: src->dims with dims[axis] removed
Computes arithmetic mean of all elements along specified axis

Example:

tofu_tensor *t = tofu_tensor_arange(0.0, 12.0, 1.0, TOFU_FLOAT);
tofu_tensor_reshape_src(t, 2, (int[]){3, 4});

tofu_tensor *row_mean = tofu_tensor_meanreduce(t, NULL, 1);
// row_mean has shape [3]
// row_mean[0] = mean([0,1,2,3]) = 1.5
// row_mean[1] = mean([4,5,6,7]) = 5.5
// row_mean[2] = mean([8,9,10,11]) = 9.5

`tofu_tensor_maxreduce`

Reduce tensor along axis using max operation.

tofu_tensor *tofu_tensor_maxreduce(const tofu_tensor *src, tofu_tensor *dst,
                                   tofu_tensor *arg, int axis);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
arg - Argmax indices tensor (can be NULL if indices not needed)
axis - Axis along which to reduce

Returns: Result tensor with dims[axis] removed (caller owns if dst was NULL)

Behavior:

Output shape: src->dims with dims[axis] removed
If arg is non-NULL, fills it with indices of maximum values

Example:

float data[] = {3.0f, 1.0f, 4.0f, 1.0f, 5.0f, 9.0f};
tofu_tensor *t = tofu_tensor_create(data, 2, (int[]){2, 3}, TOFU_FLOAT);

tofu_tensor *max_vals = tofu_tensor_maxreduce(t, NULL, NULL, 1);
// max_vals = [4.0, 9.0]

tofu_tensor *indices = tofu_tensor_zeros(1, (int[]){2}, TOFU_INT32);
max_vals = tofu_tensor_maxreduce(t, NULL, indices, 1);
// indices = [2, 2] (position of max in each row)

`tofu_tensor_sub_broadcast`

Subtract reduced tensor from source with broadcasting.

tofu_tensor *tofu_tensor_sub_broadcast(const tofu_tensor *src, const tofu_tensor *reduced,
                                       tofu_tensor *dst, int axis);

Parameters:

src - Source tensor (cannot be NULL)
reduced - Reduced tensor to subtract (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
axis - Axis along which reduction was performed

Returns: Result tensor with same shape as src (caller owns if dst was NULL)

Preconditions:

reduced->ndim = src->ndim - 1 (one dimension removed)

Behavior:

Broadcasts reduced tensor back along axis and subtracts
Useful for normalization operations (subtract mean, etc.)

Activation Functions

`tofu_tensor_lrelu`

Apply Leaky ReLU activation function.

tofu_tensor *tofu_tensor_lrelu(const tofu_tensor *src, tofu_tensor *dst, float negslope);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
negslope - Slope for negative values (typically 0.01)

Returns: Result tensor with same shape as src (caller owns if dst was NULL)

Behavior:

Computes: x if x >= 0, else negslope * x
Standard ReLU equivalent when negslope = 0

Example:

float data[] = {-2.0f, -1.0f, 0.0f, 1.0f, 2.0f};
tofu_tensor *t = tofu_tensor_create(data, 1, (int[]){5}, TOFU_FLOAT);

tofu_tensor *relu = tofu_tensor_lrelu(t, NULL, 0.0f);
// relu = [0.0, 0.0, 0.0, 1.0, 2.0]

tofu_tensor *leaky = tofu_tensor_lrelu(t, NULL, 0.01f);
// leaky = [-0.02, -0.01, 0.0, 1.0, 2.0]

Note: For use in computation graphs with automatic differentiation, use tofu_graph_relu() instead.

`tofu_tensor_softmax`

Apply softmax activation along specified axis.

tofu_tensor *tofu_tensor_softmax(const tofu_tensor *src, tofu_tensor *dst, int axis);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
axis - Axis along which to apply softmax

Returns: Result tensor with same shape as src (caller owns if dst was NULL)

Behavior:

Computes: exp(x_i) / sum(exp(x_j)) along axis
Uses numerically stable implementation (subtracts max before exp)
Output values sum to 1.0 along specified axis

Example:

float logits[] = {1.0f, 2.0f, 3.0f};
tofu_tensor *t = tofu_tensor_create(logits, 1, (int[]){3}, TOFU_FLOAT);
tofu_tensor *probs = tofu_tensor_softmax(t, NULL, 0);
// probs ≈ [0.09, 0.24, 0.67] (sums to 1.0)

Note: For use in computation graphs with automatic differentiation, use tofu_graph_softmax() instead.

`tofu_tensor_layer_norm`

Apply layer normalization with learnable affine transform.

tofu_tensor *tofu_tensor_layer_norm(const tofu_tensor *src, tofu_tensor *dst,
                                    const tofu_tensor *gamma, const tofu_tensor *beta,
                                    int axis, double eps);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
gamma - Scale parameter tensor (can be NULL for no scaling)
beta - Shift parameter tensor (can be NULL for no shift)
axis - Axis along which to normalize
eps - Small constant for numerical stability (typically 1e-5)

Returns: Result tensor with same shape as src (caller owns if dst was NULL)

Behavior:

Normalizes: (x - mean) / sqrt(variance + eps)
Then applies: gamma * normalized + beta (if gamma/beta non-NULL)
If gamma/beta are NULL, only normalization is applied

Example:

tofu_tensor *x = tofu_tensor_zeros(2, (int[]){2, 4}, TOFU_FLOAT);
float gamma_data[] = {1.0f, 1.0f, 1.0f, 1.0f};
float beta_data[] = {0.0f, 0.0f, 0.0f, 0.0f};
tofu_tensor *gamma = tofu_tensor_create(gamma_data, 1, (int[]){4}, TOFU_FLOAT);
tofu_tensor *beta = tofu_tensor_create(beta_data, 1, (int[]){4}, TOFU_FLOAT);

tofu_tensor *normalized = tofu_tensor_layer_norm(x, NULL, gamma, beta, 1, 1e-5);

Utilities

`tofu_tensor_issameshape`

Check if two tensors have the same shape.

int tofu_tensor_issameshape(const tofu_tensor *t1, const tofu_tensor *t2);

Parameters:

t1 - First tensor (cannot be NULL)
t2 - Second tensor (cannot be NULL)

Returns: 1 if same shape, 0 otherwise

`tofu_tensor_isbroadcastable`

Check if two tensors can be broadcast together (NumPy semantics).

int tofu_tensor_isbroadcastable(const tofu_tensor *t1, const tofu_tensor *t2);

Parameters:

t1 - First tensor (cannot be NULL)
t2 - Second tensor (cannot be NULL)

Returns: 1 if broadcastable, 0 otherwise

Broadcasting Rules:

Arrays with fewer dimensions are prepended with size-1 dimensions
Size-1 dimensions are stretched to match the other array
Dimensions must match or one must be 1

Example:

tofu_tensor *a = tofu_tensor_zeros(2, (int[]){3, 4}, TOFU_FLOAT);
tofu_tensor *b = tofu_tensor_zeros(1, (int[]){4}, TOFU_FLOAT);
int can_broadcast = tofu_tensor_isbroadcastable(a, b);  // Returns 1

tofu_tensor *c = tofu_tensor_zeros(1, (int[]){3}, TOFU_FLOAT);
can_broadcast = tofu_tensor_isbroadcastable(a, c);  // Returns 0

`tofu_tensor_broadcast_to`

Broadcast tensor to specified shape (NumPy semantics).

tofu_tensor *tofu_tensor_broadcast_to(const tofu_tensor *src, tofu_tensor *dst,
                                      int ndim, const int *dims);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
ndim - Number of dimensions for target shape
dims - Target dimension sizes

Returns: Result tensor with target shape (caller owns if dst was NULL)

Preconditions:

src must be broadcastable to target shape (NumPy rules)

Behavior:

Follows NumPy broadcasting rules
Size-1 dimensions are stretched to match target

`tofu_tensor_print`

Print tensor to stdout with custom format.

void tofu_tensor_print(const tofu_tensor *t, const char *fmt);

Parameters:

t - Tensor to print (cannot be NULL)
fmt - Format string for each element (e.g., "%.6f", "%d")

Example:

tofu_tensor *t = tofu_tensor_arange(0.0, 6.0, 1.0, TOFU_FLOAT);
tofu_tensor_reshape_src(t, 2, (int[]){2, 3});

tofu_tensor_print(t, "%.1f");
// Output:
// [[0.0, 1.0, 2.0],
//  [3.0, 4.0, 5.0]]

See also: tofu_tensor_fprint for printing to arbitrary stream, tofu_tensor_save for saving to file

`tofu_tensor_fprint`

Print tensor to file stream with custom format.

void tofu_tensor_fprint(FILE *stream, const tofu_tensor *t, const char *fmt);

Parameters:

stream - File stream to write to (cannot be NULL)
t - Tensor to print (cannot be NULL)
fmt - Format string for each element

`tofu_tensor_save`

Save tensor to file with custom format.

int tofu_tensor_save(const char *file_name, const tofu_tensor *t, const char *fmt);

Parameters:

file_name - Path to output file (cannot be NULL)
t - Tensor to save (cannot be NULL)
fmt - Format string for each element

Returns: 0 on success, non-zero on error

`tofu_tensor_convert`

Convert tensor to different data type.

tofu_tensor *tofu_tensor_convert(const tofu_tensor *src, tofu_tensor *dst,
                                 tofu_dtype dtype_d);

Parameters:

src - Source tensor (cannot be NULL)
dst - Destination tensor (can be NULL to allocate new)
dtype_d - Target data type

Returns: Result tensor with same shape as src but different dtype (caller owns if dst was NULL)

Behavior:

Converts each element to target type with appropriate casting
May lose precision (e.g., float to int truncates)

Example:

float data[] = {1.7f, 2.3f, 3.9f};
tofu_tensor *floats = tofu_tensor_create(data, 1, (int[]){3}, TOFU_FLOAT);
tofu_tensor *ints = tofu_tensor_convert(floats, NULL, TOFU_INT32);
// ints = [1, 2, 3]

`tofu_tensor_index`

Convert multi-dimensional coordinates to flat index.

int tofu_tensor_index(const tofu_tensor *t, int *coords);

Parameters:

t - Tensor (cannot be NULL)
coords - Array of coordinates, length must be t->ndim

Returns: Flat index into tensor data array

`tofu_tensor_coords`

Convert flat index to multi-dimensional coordinates.

void tofu_tensor_coords(const tofu_tensor *t, int index, int *coords);

Parameters:

t - Tensor (cannot be NULL)
index - Flat index into tensor data array
coords - Output array for coordinates, length must be t->ndim

Common Patterns

Working with Tensor Memory

// Pattern 1: User manages data buffer
float data[4] = {1.0f, 2.0f, 3.0f, 4.0f};
tofu_tensor *t = tofu_tensor_create(data, 1, (int[]){4}, TOFU_FLOAT);
// Use tensor...
tofu_tensor_free(t);
// data is still valid

// Pattern 2: Library manages data buffer
tofu_tensor *t = tofu_tensor_zeros(1, (int[]){4}, TOFU_FLOAT);
// Use tensor...
tofu_tensor_free_data_too(t);
// Both tensor and data are freed

Accessing Tensor Elements

tofu_tensor *t = tofu_tensor_zeros(2, (int[]){3, 4}, TOFU_FLOAT);

// Read element at index i
float value;
TOFU_TENSOR_DATA_TO(t, i, value, TOFU_FLOAT);

// Write element at index i
value = 42.0f;
TOFU_TENSOR_DATA_FROM(t, i, value, TOFU_FLOAT);

// Copy element from src[si] to dst[di]
TOFU_TENSOR_DATA_ASSIGN(dst, di, src, si);

Broadcasting Example

// Add scalar to matrix (broadcasting)
tofu_tensor *matrix = tofu_tensor_zeros(2, (int[]){3, 4}, TOFU_FLOAT);
tofu_tensor *result = tofu_tensor_elew_param(matrix, 5.0, NULL, TOFU_SUM);

// Add vector to matrix rows (broadcasting)
tofu_tensor *row_vec = tofu_tensor_zeros(1, (int[]){4}, TOFU_FLOAT);
result = tofu_tensor_elew_broadcast(matrix, row_vec, NULL, TOFU_SUM);

Keyboard shortcuts

Tofu User Guide