自動微分

自動微分#

在本節中，您將學習 JAX 中自動微分（autodiff）的基本應用。JAX 具有相當通用的自動微分系統。計算梯度是現代機器學習方法的重要組成部分，本教學將引導您了解一些自動微分的入門主題，例如

1. 使用 jax.grad 取得梯度
2. 在線性邏輯迴歸中計算梯度
3. 對巢狀列表、元組和字典進行微分
4. 使用 jax.value_and_grad 評估函數及其梯度
5. 對數值差異進行檢查

請務必查看進階自動微分教學，以了解更進階的主題。

雖然在大多數情況下，理解自動微分的「底層」運作方式對於使用 JAX 並非至關重要，但我們鼓勵您觀看這個相當容易理解的影片，以更深入地了解其運作原理。

1. 使用 `jax.grad` 取得梯度#

在 JAX 中，您可以使用 jax.grad() 轉換來微分純量值函數

import jax
import jax.numpy as jnp
from jax import grad

grad_tanh = grad(jnp.tanh)
print(grad_tanh(2.0))

0.070650816

jax.grad() 接受一個函數並返回一個函數。如果您有一個 Python 函數 f，用於評估數學函數 \(f\)，那麼 jax.grad(f) 是一個 Python 函數，用於評估數學函數 \(\nabla f\)。這表示 grad(f)(x) 代表值 \(\nabla f(x)\)。

由於 jax.grad() 對函數進行操作，您可以將其應用於自身的輸出，以根據需要微分多次

print(grad(grad(jnp.tanh))(2.0))
print(grad(grad(grad(jnp.tanh)))(2.0))

-0.13621868
0.25265405

JAX 的自動微分功能使計算高階導數變得容易，因為計算導數的函數本身是可微分的。因此，高階導數就像堆疊轉換一樣容易。這可以在單變數情況下說明

\(f(x) = x^3 + 2x^2 - 3x + 1\) 的導數可以計算為

f = lambda x: x**3 + 2*x**2 - 3*x + 1

dfdx = jax.grad(f)

\(f\) 的高階導數為

\[\begin{split} \begin{array}{l} f'(x) = 3x^2 + 4x -3\\ f''(x) = 6x + 4\\ f'''(x) = 6\\ f^{iv}(x) = 0 \end{array} \end{split}\]

在 JAX 中計算其中任何一個都像鏈接 jax.grad() 函數一樣容易

d2fdx = jax.grad(dfdx)
d3fdx = jax.grad(d2fdx)
d4fdx = jax.grad(d3fdx)

在 \(x=1\) 中評估上述內容會得到

\[\begin{split} \begin{array}{l} f'(1) = 4\\ f''(1) = 10\\ f'''(1) = 6\\ f^{iv}(1) = 0 \end{array} \end{split}\]

使用 JAX

print(dfdx(1.))
print(d2fdx(1.))
print(d3fdx(1.))
print(d4fdx(1.))

2. 在線性邏輯迴歸中計算梯度#

下一個範例展示如何在線性邏輯迴歸模型中使用 jax.grad() 計算梯度。首先，設定

key = jax.random.key(0)

def sigmoid(x):
  return 0.5 * (jnp.tanh(x / 2) + 1)

# Outputs probability of a label being true.
def predict(W, b, inputs):
  return sigmoid(jnp.dot(inputs, W) + b)

# Build a toy dataset.
inputs = jnp.array([[0.52, 1.12,  0.77],
                    [0.88, -1.08, 0.15],
                    [0.52, 0.06, -1.30],
                    [0.74, -2.49, 1.39]])
targets = jnp.array([True, True, False, True])

# Training loss is the negative log-likelihood of the training examples.
def loss(W, b):
  preds = predict(W, b, inputs)
  label_probs = preds * targets + (1 - preds) * (1 - targets)
  return -jnp.sum(jnp.log(label_probs))

# Initialize random model coefficients
key, W_key, b_key = jax.random.split(key, 3)
W = jax.random.normal(W_key, (3,))
b = jax.random.normal(b_key, ())

使用 jax.grad() 函數及其 argnums 參數，以針對位置引數微分函數。

# Differentiate `loss` with respect to the first positional argument:
W_grad = grad(loss, argnums=0)(W, b)
print(f'{W_grad=}')

# Since argnums=0 is the default, this does the same thing:
W_grad = grad(loss)(W, b)
print(f'{W_grad=}')

# But you can choose different values too, and drop the keyword:
b_grad = grad(loss, 1)(W, b)
print(f'{b_grad=}')

# Including tuple values
W_grad, b_grad = grad(loss, (0, 1))(W, b)
print(f'{W_grad=}')
print(f'{b_grad=}')

W_grad=Array([-0.43314594, -0.7354604 , -1.2598921 ], dtype=float32)
W_grad=Array([-0.43314594, -0.7354604 , -1.2598921 ], dtype=float32)
b_grad=Array(-0.6900177, dtype=float32)
W_grad=Array([-0.43314594, -0.7354604 , -1.2598921 ], dtype=float32)
b_grad=Array(-0.6900177, dtype=float32)

jax.grad() API 與 Spivak 經典著作 Calculus on Manifolds (1965) 中出色的符號表示法直接對應，Sussman 和 Wisdom 的 Structure and Interpretation of Classical Mechanics (2015) 以及他們的 Functional Differential Geometry (2013) 中也使用了這種表示法。這兩本書都是開放存取的。特別參閱 Functional Differential Geometry 的「序言」部分，以了解對此符號表示法的辯護。

本質上，當使用 argnums 引數時，如果 f 是用於評估數學函數 \(f\) 的 Python 函數，那麼 Python 表達式 jax.grad(f, i) 會評估為用於評估 \(\partial_i f\) 的 Python 函數。

3. 對巢狀列表、元組和字典進行微分#

由於 JAX 的 PyTree 抽象化（請參閱使用 pytrees），對標準 Python 容器進行微分可以直接運作，因此請隨意使用元組、列表和字典（以及任意巢狀結構）。

繼續先前的範例

def loss2(params_dict):
    preds = predict(params_dict['W'], params_dict['b'], inputs)
    label_probs = preds * targets + (1 - preds) * (1 - targets)
    return -jnp.sum(jnp.log(label_probs))

print(grad(loss2)({'W': W, 'b': b}))

{'W': Array([-0.43314594, -0.7354604 , -1.2598921 ], dtype=float32), 'b': Array(-0.6900177, dtype=float32)}

您可以建立自訂 pytree 節點，不僅可以使用 jax.grad()，還可以與其他 JAX 轉換（jax.jit()、jax.vmap() 等）一起使用。

4. 使用 `jax.value_and_grad` 評估函數及其梯度#

另一個方便的函數是 jax.value_and_grad()，用於一次有效率地計算函數的值及其梯度的值。

繼續先前的範例

loss_value, Wb_grad = jax.value_and_grad(loss, (0, 1))(W, b)
print('loss value', loss_value)
print('loss value', loss(W, b))

loss value 2.9729187
loss value 2.9729187

5. 對數值差異進行檢查#

關於導數的一件很棒的事情是，它們可以使用有限差分法直接進行檢查。

繼續先前的範例

# Set a step size for finite differences calculations
eps = 1e-4

# Check b_grad with scalar finite differences
b_grad_numerical = (loss(W, b + eps / 2.) - loss(W, b - eps / 2.)) / eps
print('b_grad_numerical', b_grad_numerical)
print('b_grad_autodiff', grad(loss, 1)(W, b))

# Check W_grad with finite differences in a random direction
key, subkey = jax.random.split(key)
vec = jax.random.normal(subkey, W.shape)
unitvec = vec / jnp.sqrt(jnp.vdot(vec, vec))
W_grad_numerical = (loss(W + eps / 2. * unitvec, b) - loss(W - eps / 2. * unitvec, b)) / eps
print('W_dirderiv_numerical', W_grad_numerical)
print('W_dirderiv_autodiff', jnp.vdot(grad(loss)(W, b), unitvec))

b_grad_numerical -0.6890297
b_grad_autodiff -0.6900177
W_dirderiv_numerical 1.3017654
W_dirderiv_autodiff 1.3006743

JAX 提供了一個簡單的便利函數，它基本上做同樣的事情，但可以檢查您喜歡的任何微分階數

from jax.test_util import check_grads

check_grads(loss, (W, b), order=2)  # check up to 2nd order derivatives

下一步#

進階自動微分教學提供了更進階和更詳細的說明，介紹了本文檔中涵蓋的想法如何在 JAX 後端中實作。某些功能，例如用於 JAX 可轉換 Python 函數的自訂導數規則，取決於對進階自動微分的理解，因此如果您有興趣，請務必查看進階自動微分教學中的該部分。

自動微分

目錄

自動微分#

1. 使用 jax.grad 取得梯度#

2. 在線性邏輯迴歸中計算梯度#

3. 對巢狀列表、元組和字典進行微分#

4. 使用 jax.value_and_grad 評估函數及其梯度#

5. 對數值差異進行檢查#

下一步#

1. 使用 `jax.grad` 取得梯度#

4. 使用 `jax.value_and_grad` 評估函數及其梯度#