Chainer v3

of 16 /16
Chainer v3 Chainer Meetup #06 @ PFN, Sep. 30, 2017 Seiya Tokui @ Preferred Networks

Embed Size (px)

Transcript of Chainer v3

  1. 1. Chainer v3 Chainer Meetup #06 @ PFN, Sep. 30, 2017 Seiya Tokui @ Preferred Networks
  2. 2. Recent/coming releases Chainer v3.0.0 RC, v2.1.0: Sep. 12 v3 RC was the 50th release! CuPy v2.0.0 RC, v1.0.3 on the same day Next release: Chainer v3.0.0 and v4.0.0 on Oct. 17 CuPy v2.0.0 and v3.0.0 on the same day Today, I mainly talk about the features of CuPy v2.0.0 RC and Chainer v3.0.0 RC
  3. 3. Chainer v3.0.0rc1 For most users, the backward compatibility is maintained See the release notes of v3.0.0rc1 for some small breaks that do not affect most users The inner-working is greatly changed It may cause some existing code that directly touches the computational graphs broken Thanks to this change, we now support double backprop (a.k.a. gradient of gradients) as announced
  4. 4. Double backprop Automatic backpropagation through gradients When is it needed? Consider a loss function that includes a gradient computation as a term/factor E.g. the loss function for WGAN-GP: + 2 1 2 To take the gradient of this loss function, we need to do backprop through ( ), which itself we want to compute with backprop! gradient
  5. 5. Double backprop in Chainer v3 Many functions now support double backprop Those functions are rewritten to implement a new interface named FunctionNode (such functions are called new-style Functions) backward() takes Variable instead of ndarray as grad_outputs and return values, which means backward() itself can be differentiated Variable has now an attribute grad_var, which represents the gradient as a Variable (so that we can use it in the computational graph)
  6. 6. How to implement WGAN-GP 1. Using Variable.backward() x_tilde = generator(z) x_hat = x + u * (x_tilde x) D(x_hat).backward(enable_double_backprop=True) # 1st diff gp = lambda * (x_hat.grad_var 1) ** 2 loss = D(x_tilde) D(x) + gp model.cleargrads() # to clear the 1st diff of params loss.backward() # 2nd diff
  7. 7. How to implement WGAN-GP 2. Using grad() x_tilde = generator(z) x_hat = x + u * (x_tilde x) gx_hat, = chainer.grad([D(x_hat)], [x_hat], enable_double_backprop=True) # 1st diff gp = lambda * (gx_hat 1) ** 2 loss = D(x_tilde) D(x) + gp loss.backward() # 2nd diff This version is more efficient because grad() can skip the gradient computation for parameters (thus also we can drop cleargrads()).
  8. 8. New-style Function support Most standard functions are now ported to the new-style interface: +, -, *, Convolution2D, Deconvolution2D, EmbedID, Linear, LSTM, BatchNormalization, sigmoid, relu, leaky_relu, softmax, log_softmax, tanh, exp, mean_squared_error, softmax_cross_entropy, dropout, layer_normalization, transpose, reshape, broadcast_to, sum, concat, __getitem__, etc We are still working on widening the double backprop support. Contributions are also welcome!!
  9. 9. Other features Functions: layer_normalization, selu, arctan2, prod, NumPy-compatible matmul Links: ChildSumTreeLSTM, NaryTreeLSTM, BatchRenormalization Other new features: LeCunNormal, as_variable(), Variable.array, strict option of load_npz(), etc.
  10. 10. CuPy v2.0.0rc1 Sparse matrix support Complex number support Improved memory allocator Many new functions, esp. of linear algebra routines
  11. 11. Sparse matrix support cupy.sparse --- the sparse matrix support with APIs compatible to scipy.sparse CSR/CSC/COO and diagonal format Basic arithmetics, matrix product, element indexing Slicing along the major axis Dense Sparse conversion
  12. 12. Complex number support CuPy now supports complex numbers! Dtypes complex32, complex64, complex128 are now available Routines related to complex numbers: angle, conj, imag, real
  13. 13. Linear algebra routines Solvers, matrix inversion, determinant, eigenvalues, etc.: solve, tensorsolve, inv, pinv, det, slogdet, eigh, eigvalsh, matrix_rank All under cupy.linalg namespace einsum is also supported (thanks, @fukatani!) Flexible tensor product/reduction based on Einstein convention
  14. 14. Improved memory allocator The memory pool is greatly improved It now uses best-fit with coalescing algorithm The memory region is reused even if the size does not exactly match It may also contribute to the speed improvement, thanks to the reduced number of reallocations Example: the new seq2seq example originally uses all the memory of 12GB GPU, whose usage is reduced to 3GB, and also the execution time is reduced by appx. 25%.
  15. 15. Next versions As you may know, we slightly changed the release policy again; the stable releases may now include some new features (thus v2.1.0 instead of v2.0.3). v4 is scheduled based on our release policy: v4.0.0 will be three months after v3.0.0 (which will be mid Jan. if there is no delay). The core features of v4 is not determined yet; lets have discussions!