Gets the Cuda stream handle (cudaStream_t).
The return value of this function can be passed to Cuda kernels (4th parameter of kernel invocation).