Errors when 0.13.0 master or vcl

Discussion:

Trevor Grant

2016-10-28 20:28:38 UTC

My Desktop

Branch: master
Build Command: mvn clean install -Phadoop2
Built and tested successfully

Branch: andrewpalumbo/mahout/viennacl-opmmul-a
Bulid Command: mvn clean install -Pviennacl -Phadoop2 -DskipTests && mvn
test

Builds successfully-
Tests still riddled with

[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version

Would be fixed with pom hack I presented earlier (developed hack on this
box)

Machine Info:
================================================================================

os major/minor:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty

chip arch:
$ uname -a
Linux tower1 3.19.0-31-generic #36~14.04.1-Ubuntu SMP Thu Oct 8 10:21:08
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/cpuinfo
...
model name : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
...
^^ 8 cores

GPU:
$ sudo nvidia-smi
Sun Oct 16 22:14:49 2016
+------------------------------------------------------+

| NVIDIA-SMI 352.63 Driver Version: 352.63 |

|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr.
ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute
M. |
|===============================+======================+======================|
| 0 GeForce GT 740 Off | 0000:02:00.0 N/A |
N/A |
| 33% 34C P8 N/A / N/A | 330MiB / 1021MiB | N/A
Default |
+-------------------------------+----------------------+----------------------+

clinfo output:
$ clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 CUDA 7.5.23
Platform Name: NVIDIA CUDA
Platform Vendor: NVIDIA Corporation
Platform Extensions: cl_khr_byte_addressable_store cl_khr_icd
cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query
cl_nv_pragma_unroll cl_nv_copy_opts

Platform Name: NVIDIA CUDA
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4318
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 64
Max work group size: 1024
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 1
Native vector width short: 1
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1071Mhz
Address bits: 64
Max memory allocation: 267873536
Image support: Yes
Max number of images read arguments: 256
Max number of images write arguments: 16
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 4096
Max image 3D height: 4096
Max image 3D depth: 4096
Max samplers within kernel: 32
Max size of kernel argument: 4352
Alignment (bits) of base address: 4096
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 128
Cache size: 32768
Global memory size: 1071494144
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Local
Local memory size: 49152
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1000
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0xab2160
Name: GeForce GT 740
Vendor: NVIDIA Corporation
Device OpenCL C version: OpenCL C 1.2
Driver version: 352.63
Profile: FULL_PROFILE
Version: OpenCL 1.2 CUDA
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_fp64

Java major/minor and vendor:
$ java -version
java version "1.7.0_85"
OpenJDK Runtime Environment (IcedTea 2.6.1) (7u85-2.6.1-5ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.85-b03, mixed mode)

vcl version,
From /usr/include/viennacl/version.hpp
1.7.0

gcc version:
$ gcc --version
gcc (Ubuntu 5.4.1-2ubuntu1~14.04) 5.4.1 20160904

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*

Could everybody please post any errors when stack trace that they are
getting on master or the vcl pr branch along with full system info?
Ie. Branch, os major/minor, chip arch, GPU, clinfo output, Java
major/minor and vendor, vcl version, gcc version, and anything else that
may be useful so that we may compare?
Thx
Andy
Sent from my Galaxy Tab A

Andrew Musselman

2016-10-28 22:38:59 UTC

Permalink

Post by Trevor Grant
My Desktop

Branch: master

Post by Trevor Grant
Build Command: mvn clean install -Phadoop2
Built and tested successfully

Branch: andrewpalumbo/mahout/viennacl-opmmul-a

Post by Trevor Grant
Bulid Command: mvn clean install -Pviennacl -Phadoop2

Builds successfully-

Post by Trevor Grant
Tests still riddled with

[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver

Post by Trevor Grant
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version

Also OOM
ViennaCLSuiteOMP:
- row-major viennacl::matrix
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version
- mmul microbenchmark
+ Mahout multiplication time: 8875 ms.
+ ViennaCL/cpu/OpenMP multiplication time: 599 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version
*** RUN ABORTED ***
java.lang.OutOfMemoryError: Java heap space
at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(Int2DoubleOpenHashMap.java:1059)
at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(Int2DoubleOpenHashMap.java:295)
at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(Int2DoubleOpenHashMap.java:301)
at
org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:130)
at
org.apache.mahout.math.SparseRowMatrix.setQuick(SparseRowMatrix.java:105)
at org.apache.mahout.math.AbstractMatrix.assign(AbstractMatrix.java:256)
at
org.apache.mahout.math.scalabindings.MatrixOps.$colon$eq(MatrixOps.scala:192)
at
org.apache.mahout.math.scalabindings.MatrixOps.cloned(MatrixOps.scala:260)
at
org.apache.mahout.math.scalabindings.MatrixOps.$minus(MatrixOps.scala:66)
at
org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$anonfun$4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)

Post by Trevor Grant
============================================================

====================

Post by Trevor Grant
$ lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

Post by Trevor Grant
$ uname -a

Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo

Post by Trevor Grant
...
...
^^ 6 cores
$ sudo nvidia-smi

Post by Trevor Grant
$ clinfo

Number of platforms 1
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 8.0.0
Platform Profile FULL_PROFILE
Platform Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts
Platform Extensions function suffix NV

Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GTX 750 Ti
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 367.44
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Topology (NV) PCI-E, 01:00.0
Max compute units 5
Max clock frequency 1150MHz
Compute Capability (NV) 5.0
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1
(cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 2095841280 (1.952GiB)
Error Correction support No
Max memory allocation 523960320 (499.7MiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 81920
Global Memory cache line 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 4096x4096x4096 pixels
Max number of read image args 256
Max number of write image args 16
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max constant buffer size 65536 (64KiB)
Max number of constant args 9
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 1
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) NVIDIA CUDA
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [NV]
clCreateContext(NULL, ...) [default] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in
platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices
found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in
platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform

ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.8
ICD loader Profile OpenCL 1.2
NOTE: your OpenCL library declares to support OpenCL 1.2,
but it seems to support up to OpenCL 2.1 too.

Post by Trevor Grant
$ java -version

openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
vcl version,

Post by Trevor Grant
(Ignoring what is in /usr/include/viennacl/version.hpp)
1.7.1
$ gcc --version

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609

Post by Trevor Grant
Could everybody please post any errors when stack trace that they are
getting on master or the vcl pr branch along with full system info?
Ie. Branch, os major/minor, chip arch, GPU, clinfo output, Java
major/minor and vendor, vcl version, gcc version, and anything else that
may be useful so that we may compare?
Thx
Andy
Sent from my Galaxy Tab A

Trevor Grant

2016-10-28 23:01:26 UTC

Permalink

I was finally able to hit an OOM error.

I initially was running everything from the top.

When I run mvn test from the viennacl directory...

/mahout/viennacl$ mvn test
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model
for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.13.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @
org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version],
/home/trevor/gits/mahout/viennacl/pom.xml, line 31, column 15
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but
found duplicate declaration of plugin org.codehaus.mojo:exec-maven-plugin @
org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version],
/home/trevor/gits/mahout/viennacl/pom.xml, line 181, column 15
[WARNING]
[WARNING] It is highly recommended to fix these problems because they
threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support
building such malformed projects.
[WARNING]
[INFO]

[INFO]
------------------------------------------------------------------------
[INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.13.0-SNAPSHOT
[INFO]
------------------------------------------------------------------------
[INFO]
[INFO] --- maven-enforcer-plugin:1.4:enforce (enforce-versions) @
mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:add-source (add-scala-sources) @
mahout-native-viennacl_2.10 ---
[INFO] Add Source directory:
/home/trevor/gits/mahout/viennacl/src/main/scala
[INFO] Add Test Source directory:
/home/trevor/gits/mahout/viennacl/src/test/scala
[INFO]
[INFO] --- maven-dependency-plugin:2.3:properties (default) @
mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @
mahout-native-viennacl_2.10 ---
[INFO]
[INFO] --- maven-resources-plugin:2.7:resources (default-resources) @
mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/trevor/gits/mahout/viennacl/src/main/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- exec-maven-plugin:1.2.1:exec (javacpp) @
mahout-native-viennacl_2.10 ---
Warning: Could not load platform properties for class
org.apache.mahout.viennacl.opencl.GPUMMul
Warning: Could not load platform properties for class
org.apache.mahout.viennacl.opencl.GPUMMul$
Generating
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
Compiling
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
g++ -I/usr/include/viennacl -I/usr/lib/jvm/java-7-openjdk-amd64/include
-I/usr/lib/jvm/java-7-openjdk-amd64/include/linux
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
-msse3 -ffast-math -fopenmp -fpermissive -Wl,-rpath,$ORIGIN/
-Wl,-z,noexecstack -Wl,-Bsymbolic -march=native -m64 -Wall -Ofast -fPIC
-shared -s -o libjniViennaCL.so -lOpenCL
Deleting
/home/trevor/gits/mahout/viennacl/target/classes/org/apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
[INFO]
[INFO] --- maven-resources-plugin:2.7:testResources (default-testResources)
@ mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/trevor/gits/mahout/viennacl/src/test/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- scala-maven-plugin:3.2.0:testCompile (scala-test-compile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-compiler-plugin:3.3:testCompile (default-testCompile) @
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @
mahout-native-viennacl_2.10 ---
[INFO] Tests are skipped.
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (test) @
mahout-native-viennacl_2.10 ---
Discovery starting.
Discovery completed in 201 milliseconds.
Run starting. Expected test count is: 7
ViennaCLSuiteVCL:
- row-major viennacl::matrix
+ OCL matrix memory domain after assgn=2
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
log4j:WARN No appenders could be found for logger
(org.apache.mahout.viennacl.opencl.GPUMMul$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- dense vcl mmul with fast_copy
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- mmul microbenchmark
+ Mahout multiplication time: 2630 ms.
+ ViennaCL/OpenCL multiplication time: 2072 ms.
+ ViennaCL/cpu/OpenMP multiplication time: 813 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuSparseRWRW
ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
ViennaCL: Smaller work sizes could not solve the problem.
ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE
ViennaCL could not allocate memory on the device. Most likely the device
simply ran out of memory.
If you think that this is a bug in ViennaCL, please report it at
viennacl-***@lists.sourceforge.net and supply at least the following
information:
* Operating System
* Which OpenCL implementation (AMD, NVIDIA, etc.)
* ViennaCL version
Many thanks in advance!falling back to JVM MMUL
ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
ViennaCL: Smaller work sizes could not solve the problem.
- sparse mmul microbenchmark *** FAILED ***
java.lang.RuntimeException: ViennaCL: FATAL ERROR:
CL_MEM_OBJECT_ALLOCATION_FAILURE
ViennaCL could not allocate memory on the device. Most likely the device
simply ran out of memory.
If you think that this is a bug in ViennaCL, please report it at
viennacl-***@lists.sourceforge.net and supply at least the following
information:
* Operating System
* Which OpenCL implementation (AMD, NVIDIA, etc.)
* ViennaCL version
Many thanks in advance!
at
org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.allocate(Native
Method)
at
org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.<init>(CompressedMatrix.scala:61)
at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply$mcV$sp(ViennaCLSuiteVCL.scala:250)
at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(ViennaCLSuiteVCL.scala:218)
at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(ViennaCLSuiteVCL.scala:218)
at
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
...
+ Mahout Sparse multiplication time: 13303 ms.
- VCL Dense Matrix %*% Dense vector
+ Mahout dense matrix %*% dense vector multiplication time: 0 ms.
+ ViennaCL/cpu/OpenMP dense matrix %*% dense vector multiplication time:
4 ms.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuRWCW
- Sparse %*% Dense mmul microbenchmark
+ Mahout multiplication time: 821 ms.
+ ViennaCL/OpenCL multiplication time: 754 ms.
+ ViennaCL/cpu/OpenMP multiplication time: 1473 ms.
Run completed in 26 seconds, 944 milliseconds.
Total number of tests run: 7
Suites: completed 2, aborted 0
Tests: succeeded 6, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 45.600 s
[INFO] Finished at: 2016-10-28T17:58:45-05:00
[INFO] Final Memory: 20M/309M
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.scalatest:scalatest-maven-plugin:1.0:test (test) on project
mahout-native-viennacl_2.10: There are test failures -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*

On Fri, Oct 28, 2016 at 5:38 PM, Andrew Musselman <

Post by Trevor Grant

Post by Trevor Grant
My Desktop

Branch: master

Post by Trevor Grant
Build Command: mvn clean install -Phadoop2
Built and tested successfully

Branch: andrewpalumbo/mahout/viennacl-opmmul-a

Post by Trevor Grant
Bulid Command: mvn clean install -Pviennacl -Phadoop2

Builds successfully-

Post by Trevor Grant
Tests still riddled with

[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver

Also OOM
- row-major viennacl::matrix
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version
- mmul microbenchmark
+ Mahout multiplication time: 8875 ms.
+ ViennaCL/cpu/OpenMP multiplication time: 599 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version
*** RUN ABORTED ***
java.lang.OutOfMemoryError: Java heap space
at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(
Int2DoubleOpenHashMap.java:1059)
at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(
Int2DoubleOpenHashMap.java:295)
at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(
Int2DoubleOpenHashMap.java:301)
at
org.apache.mahout.math.RandomAccessSparseVector.setQuick(
RandomAccessSparseVector.java:130)
at
org.apache.mahout.math.SparseRowMatrix.setQuick(SparseRowMatrix.java:105)
at org.apache.mahout.math.AbstractMatrix.assign(AbstractMatrix.java:256)
at
org.apache.mahout.math.scalabindings.MatrixOps.$
colon$eq(MatrixOps.scala:192)
at
org.apache.mahout.math.scalabindings.MatrixOps.cloned(MatrixOps.scala:260)
at
org.apache.mahout.math.scalabindings.MatrixOps.$minus(MatrixOps.scala:66)
at
org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$
anonfun$4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)

Post by Trevor Grant
============================================================

====================

Post by Trevor Grant
$ lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

Post by Trevor Grant
$ uname -a

Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo

Post by Trevor Grant
...
...
^^ 6 cores
$ sudo nvidia-smi

Post by Trevor Grant
$ clinfo

Number of platforms 1
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 8.0.0
Platform Profile FULL_PROFILE
Platform Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts
Platform Extensions function suffix NV
Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GTX 750 Ti
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 367.44
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Topology (NV) PCI-E, 01:00.0
Max compute units 5
Max clock frequency 1150MHz
Compute Capability (NV) 5.0
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1
(cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 2095841280 (1.952GiB)
Error Correction support No
Max memory allocation 523960320 (499.7MiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 81920
Global Memory cache line 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 4096x4096x4096 pixels
Max number of read image args 256
Max number of write image args 16
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max constant buffer size 65536 (64KiB)
Max number of constant args 9
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 1
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) NVIDIA CUDA
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [NV]
clCreateContext(NULL, ...) [default] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in
platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices
found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in
platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.8
ICD loader Profile OpenCL 1.2
NOTE: your OpenCL library declares to support OpenCL 1.2,
but it seems to support up to OpenCL 2.1 too.

Post by Trevor Grant
$ java -version

openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
vcl version,

Post by Trevor Grant
(Ignoring what is in /usr/include/viennacl/version.hpp)
1.7.1
$ gcc --version

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609

Trevor Grant

2016-10-28 23:15:08 UTC

Permalink

Anecdotes on my OOM error:

It only fails on one test, it is *sort of* working though.

I had the NVIDIA monitor up while I run the tests. I did this a few times
and monitored results (if anyone knows a CLI way to do this and pipe
results to a file- dope).

In general, the background memory utilization of my graphics card is 33%
(of 1G total). As tests run, utilization is as high as 50% of memory and
99% of GPU utilization (for other tests). This makes the OOM error even
more curious.

Does anyone know a good util to use to monitor the card while it is
running.

Keep up the fight Andrew, I feel like this is very close.

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*

Post by Trevor Grant
I was finally able to hit an OOM error.
I initially was running everything from the top.
When I run mvn test from the viennacl directory...
/mahout/viennacl$ mvn test
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective
model for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.13.
0-SNAPSHOT
org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-version],
/home/trevor/gits/mahout/viennacl/pom.xml, line 31, column 15
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but
found duplicate declaration of plugin org.codehaus.mojo:exec-maven-plugin
@ org.apache.mahout:mahout-native-viennacl_${scala.
compat.version}:[unknown-version], /home/trevor/gits/mahout/viennacl/pom.xml,
line 181, column 15
[WARNING]
[WARNING] It is highly recommended to fix these problems because they
threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support
building such malformed projects.
[WARNING]
[INFO]
[INFO] ------------------------------------------------------------
------------
[INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.13.0-SNAPSHOT
[INFO] ------------------------------------------------------------
------------
[INFO]
mahout-native-viennacl_2.10 ---
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Add Source directory: /home/trevor/gits/mahout/
viennacl/src/main/scala
[INFO] Add Test Source directory: /home/trevor/gits/mahout/
viennacl/src/test/scala
[INFO]
mahout-native-viennacl_2.10 ---
[INFO]
mahout-native-viennacl_2.10 ---
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/trevor/gits/mahout/
viennacl/src/main/resources
[INFO] Copying 3 resources
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
mahout-native-viennacl_2.10 ---
Warning: Could not load platform properties for class
org.apache.mahout.viennacl.opencl.GPUMMul
Warning: Could not load platform properties for class
org.apache.mahout.viennacl.opencl.GPUMMul$
Generating /home/trevor/gits/mahout/viennacl/target/classes/org/
apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
Compiling /home/trevor/gits/mahout/viennacl/target/classes/org/
apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
g++ -I/usr/include/viennacl -I/usr/lib/jvm/java-7-openjdk-amd64/include
-I/usr/lib/jvm/java-7-openjdk-amd64/include/linux
/home/trevor/gits/mahout/viennacl/target/classes/org/
apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp -msse3 -ffast-math
-fopenmp -fpermissive -Wl,-rpath,$ORIGIN/ -Wl,-z,noexecstack -Wl,-Bsymbolic
-march=native -m64 -Wall -Ofast -fPIC -shared -s -o libjniViennaCL.so
-lOpenCL
Deleting /home/trevor/gits/mahout/viennacl/target/classes/org/
apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
[INFO]
[INFO] --- maven-resources-plugin:2.7:testResources
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/trevor/gits/mahout/
viennacl/src/test/resources
[INFO] Copying 3 resources
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Tests are skipped.
[INFO]
mahout-native-viennacl_2.10 ---
Discovery starting.
Discovery completed in 201 milliseconds.
Run starting. Expected test count is: 7
- row-major viennacl::matrix
+ OCL matrix memory domain after assgn=2
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
log4j:WARN No appenders could be found for logger
(org.apache.mahout.viennacl.opencl.GPUMMul$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- dense vcl mmul with fast_copy
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- mmul microbenchmark
+ Mahout multiplication time: 2630 ms.
+ ViennaCL/OpenCL multiplication time: 2072 ms.
+ ViennaCL/cpu/OpenMP multiplication time: 813 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuSparseRWRW
ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
ViennaCL: Smaller work sizes could not solve the problem.
ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE
ViennaCL could not allocate memory on the device. Most likely the device
simply ran out of memory.
If you think that this is a bug in ViennaCL, please report it at
* Operating System
* Which OpenCL implementation (AMD, NVIDIA, etc.)
* ViennaCL version
Many thanks in advance!falling back to JVM MMUL
ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
ViennaCL: Smaller work sizes could not solve the problem.
- sparse mmul microbenchmark *** FAILED ***
CL_MEM_OBJECT_ALLOCATION_FAILURE
ViennaCL could not allocate memory on the device. Most likely the device
simply ran out of memory.
If you think that this is a bug in ViennaCL, please report it at
* Operating System
* Which OpenCL implementation (AMD, NVIDIA, etc.)
* ViennaCL version
Many thanks in advance!
at org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.allocate(Native
Method)
at org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.<init>(
CompressedMatrix.scala:61)
at org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$
anonfun$5.apply$mcV$sp(ViennaCLSuiteVCL.scala:250)
at org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
ViennaCLSuiteVCL.scala:218)
at org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
ViennaCLSuiteVCL.scala:218)
at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(
Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
...
+ Mahout Sparse multiplication time: 13303 ms.
- VCL Dense Matrix %*% Dense vector
+ Mahout dense matrix %*% dense vector multiplication time: 0 ms.
4 ms.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuRWCW
- Sparse %*% Dense mmul microbenchmark
+ Mahout multiplication time: 821 ms.
+ ViennaCL/OpenCL multiplication time: 754 ms.
+ ViennaCL/cpu/OpenMP multiplication time: 1473 ms.
Run completed in 26 seconds, 944 milliseconds.
Total number of tests run: 7
Suites: completed 2, aborted 0
Tests: succeeded 6, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***
[INFO] ------------------------------------------------------------
------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------
------------
[INFO] Total time: 45.600 s
[INFO] Finished at: 2016-10-28T17:58:45-05:00
[INFO] Final Memory: 20M/309M
[INFO] ------------------------------------------------------------
------------
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test
(test) on project mahout-native-viennacl_2.10: There are test failures ->
[Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the
-e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/
MojoFailureException
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
On Fri, Oct 28, 2016 at 5:38 PM, Andrew Musselman <

Post by Trevor Grant

Post by Trevor Grant
My Desktop

Branch: master

Post by Trevor Grant
Build Command: mvn clean install -Phadoop2
Built and tested successfully

Branch: andrewpalumbo/mahout/viennacl-opmmul-a

Post by Trevor Grant
Bulid Command: mvn clean install -Pviennacl -Phadoop2

Builds successfully-

Post by Trevor Grant
Tests still riddled with

[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver

Also OOM
- row-major viennacl::matrix
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version
- mmul microbenchmark
+ Mahout multiplication time: 8875 ms.
+ ViennaCL/cpu/OpenMP multiplication time: 599 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[WARN] Unable to create class GPUMMul: attempting OpenMP version
[INFO] Creating org.apache.mahout.viennacl.openmp.OMPMMul solver
[INFO] Unable to create class OMPMMul: falling back to java version
*** RUN ABORTED ***
java.lang.OutOfMemoryError: Java heap space
at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.rehash(Int2
DoubleOpenHashMap.java:1059)
at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.insert(Int2
DoubleOpenHashMap.java:295)
at
it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.put(Int2Dou
bleOpenHashMap.java:301)
at
org.apache.mahout.math.RandomAccessSparseVector.setQuick(Ran
domAccessSparseVector.java:130)
at
org.apache.mahout.math.SparseRowMatrix.setQuick(SparseRowMatrix.java:105)
at org.apache.mahout.math.AbstractMatrix.assign(AbstractMatrix.
java:256)
at
org.apache.mahout.math.scalabindings.MatrixOps.$colon$eq(
MatrixOps.scala:192)
at
org.apache.mahout.math.scalabindings.MatrixOps.cloned(
MatrixOps.scala:260)
at
org.apache.mahout.math.scalabindings.MatrixOps.$minus(MatrixOps.scala:66)
at
org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$anonfun$
4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)

Post by Trevor Grant
============================================================

====================

Post by Trevor Grant
$ lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

Post by Trevor Grant
$ uname -a

Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo

Post by Trevor Grant
...
...
^^ 6 cores
$ sudo nvidia-smi

Post by Trevor Grant
$ clinfo

Number of platforms 1
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 8.0.0
Platform Profile FULL_PROFILE
Platform Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts
Platform Extensions function suffix NV
Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GTX 750 Ti
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 367.44
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Topology (NV) PCI-E, 01:00.0
Max compute units 5
Max clock frequency 1150MHz
Compute Capability (NV) 5.0
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0
(n/a)
float 1 / 1
double 1 / 1
(cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 2095841280 (1.952GiB)
Error Correction support No
Max memory allocation 523960320 (499.7MiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 81920
Global Memory cache line 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 4096x4096x4096 pixels
Max number of read image args 256
Max number of write image args 16
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max constant buffer size 65536 (64KiB)
Max number of constant args 9
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 1
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) NVIDIA CUDA
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [NV]
clCreateContext(NULL, ...) [default] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in
platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices
found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in
platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.8
ICD loader Profile OpenCL 1.2
NOTE: your OpenCL library declares to support OpenCL 1.2,
but it seems to support up to OpenCL 2.1 too.

Post by Trevor Grant
$ java -version

openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.
04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
vcl version,

Post by Trevor Grant
(Ignoring what is in /usr/include/viennacl/version.hpp)
1.7.1
$ gcc --version

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609

that

Post by Trevor Grant
may be useful so that we may compare?
Thx
Andy
Sent from my Galaxy Tab A

Andrew Musselman

2016-10-29 17:45:20 UTC

Permalink

It's not reliable; I've had the -Pviennacl build pass and fail, and the
-Pviennacl-omp build fail but not pass.

Ok this makes sense.. it's not actually a (java) OOM error, it's actually
a blanket GPU error... that's why we have the fall backs to omp and jvm..
Does it happen each time you run that test? Some times these are one off
things where the gpu buffer had not been cleared when the next test
starts.. or something like that.
Thanks this is a big help in figuring out what's happening.
Sent from my Verizon Wireless 4G LTE smartphone
-------- Original message --------
Date: 10/28/2016 4:01 PM (GMT-08:00)
Subject: Re: Errors when 0.13.0 master or vcl
I was finally able to hit an OOM error.
I initially was running everything from the top.
When I run mvn test from the viennacl directory...
/mahout/viennacl$ mvn test
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model
for org.apache.mahout:mahout-native-viennacl_2.10:jar:0.13.0-SNAPSHOT
org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-
version],
/home/trevor/gits/mahout/viennacl/pom.xml, line 31, column 15
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but
org.apache.mahout:mahout-native-viennacl_${scala.compat.version}:[unknown-
version],
/home/trevor/gits/mahout/viennacl/pom.xml, line 181, column 15
[WARNING]
[WARNING] It is highly recommended to fix these problems because they
threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support
building such malformed projects.
[WARNING]
[INFO]
[INFO]
------------------------------------------------------------------------
[INFO] Building Mahout Native VienniaCL OpenCL Bindings 0.13.0-SNAPSHOT
[INFO]
------------------------------------------------------------------------
[INFO]
mahout-native-viennacl_2.10 ---
[INFO]
mahout-native-viennacl_2.10 ---
/home/trevor/gits/mahout/viennacl/src/main/scala
/home/trevor/gits/mahout/viennacl/src/test/scala
[INFO]
mahout-native-viennacl_2.10 ---
[INFO]
mahout-native-viennacl_2.10 ---
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/trevor/gits/mahout/viennacl/src/main/resources
[INFO] Copying 3 resources
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
mahout-native-viennacl_2.10 ---
Warning: Could not load platform properties for class
org.apache.mahout.viennacl.opencl.GPUMMul
Warning: Could not load platform properties for class
org.apache.mahout.viennacl.opencl.GPUMMul$
Generating
/home/trevor/gits/mahout/viennacl/target/classes/org/
apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
Compiling
/home/trevor/gits/mahout/viennacl/target/classes/org/
apache/mahout/viennacl/opencl/javacpp/linux-x86_64/libjniViennaCL.so
g++ -I/usr/include/viennacl -I/usr/lib/jvm/java-7-openjdk-amd64/include
-I/usr/lib/jvm/java-7-openjdk-amd64/include/linux
/home/trevor/gits/mahout/viennacl/target/classes/org/
apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
-msse3 -ffast-math -fopenmp -fpermissive -Wl,-rpath,$ORIGIN/
-Wl,-z,noexecstack -Wl,-Bsymbolic -march=native -m64 -Wall -Ofast -fPIC
-shared -s -o libjniViennaCL.so -lOpenCL
Deleting
/home/trevor/gits/mahout/viennacl/target/classes/org/
apache/mahout/viennacl/opencl/javacpp/jniViennaCL.cpp
[INFO]
[INFO] --- maven-resources-plugin:2.7:testResources
(default-testResources)
@ mahout-native-viennacl_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/trevor/gits/mahout/viennacl/src/test/resources
[INFO] Copying 3 resources
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
mahout-native-viennacl_2.10 ---
[INFO] Tests are skipped.
[INFO]
mahout-native-viennacl_2.10 ---
Discovery starting.
Discovery completed in 201 milliseconds.
Run starting. Expected test count is: 7
- row-major viennacl::matrix
+ OCL matrix memory domain after assgn=2
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
log4j:WARN No appenders could be found for logger
(org.apache.mahout.viennacl.opencl.GPUMMul$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- dense vcl mmul with fast_copy
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
jvmRWRW
gpuRWCW
- mmul microbenchmark
+ Mahout multiplication time: 2630 ms.
+ ViennaCL/OpenCL multiplication time: 2072 ms.
+ ViennaCL/cpu/OpenMP multiplication time: 813 ms.
- trans
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuSparseRWRW
ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
ViennaCL: Smaller work sizes could not solve the problem.
ViennaCL: FATAL ERROR: CL_MEM_OBJECT_ALLOCATION_FAILURE
ViennaCL could not allocate memory on the device. Most likely the device
simply ran out of memory.
If you think that this is a bug in ViennaCL, please report it at
* Operating System
* Which OpenCL implementation (AMD, NVIDIA, etc.)
* ViennaCL version
Many thanks in advance!falling back to JVM MMUL
ViennaCL: FATAL ERROR: Kernel start failed for 'spgemm_stage3'.
ViennaCL: Smaller work sizes could not solve the problem.
- sparse mmul microbenchmark *** FAILED ***
CL_MEM_OBJECT_ALLOCATION_FAILURE
ViennaCL could not allocate memory on the device. Most likely the device
simply ran out of memory.
If you think that this is a bug in ViennaCL, please report it at
* Operating System
* Which OpenCL implementation (AMD, NVIDIA, etc.)
* ViennaCL version
Many thanks in advance!
at
org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.allocate(Native
Method)
at
org.apache.mahout.viennacl.opencl.javacpp.CompressedMatrix.<init>(
CompressedMatrix.scala:61)
at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$
anonfun$5.apply$mcV$sp(ViennaCLSuiteVCL.scala:250)
at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
ViennaCLSuiteVCL.scala:218)
at
org.apache.mahout.viennacl.opencl.ViennaCLSuiteVCL$$anonfun$5.apply(
ViennaCLSuiteVCL.scala:218)
at
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(
Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
...
+ Mahout Sparse multiplication time: 13303 ms.
- VCL Dense Matrix %*% Dense vector
+ Mahout dense matrix %*% dense vector multiplication time: 0 ms.
4 ms.
[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver
[INFO] Successfully created org.apache.mahout.viennacl.opencl.GPUMMul solver
gpuRWCW
- Sparse %*% Dense mmul microbenchmark
+ Mahout multiplication time: 821 ms.
+ ViennaCL/OpenCL multiplication time: 754 ms.
+ ViennaCL/cpu/OpenMP multiplication time: 1473 ms.
Run completed in 26 seconds, 944 milliseconds.
Total number of tests run: 7
Suites: completed 2, aborted 0
Tests: succeeded 6, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 45.600 s
[INFO] Finished at: 2016-10-28T17:58:45-05:00
[INFO] Final Memory: 20M/309M
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.scalatest:scalatest-maven-plugin:1.0:test (test) on project
mahout-native-viennacl_2.10: There are test failures -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
On Fri, Oct 28, 2016 at 5:38 PM, Andrew Musselman <

Post by Trevor Grant

Post by Trevor Grant
My Desktop

Branch: master

Post by Trevor Grant
Build Command: mvn clean install -Phadoop2
Built and tested successfully

Branch: andrewpalumbo/mahout/viennacl-opmmul-a

Post by Trevor Grant
Bulid Command: mvn clean install -Pviennacl -Phadoop2

Builds successfully-

Post by Trevor Grant
Tests still riddled with

[INFO] Creating org.apache.mahout.viennacl.opencl.GPUMMul solver

SparseRowMatrix.java:105)

Post by Trevor Grant
at org.apache.mahout.math.AbstractMatrix.assign(

AbstractMatrix.java:256)

Post by Trevor Grant
at
org.apache.mahout.math.scalabindings.MatrixOps.$
colon$eq(MatrixOps.scala:192)
at
org.apache.mahout.math.scalabindings.MatrixOps.

cloned(MatrixOps.scala:260)

Post by Trevor Grant
at
org.apache.mahout.math.scalabindings.MatrixOps.$

minus(MatrixOps.scala:66)

Post by Trevor Grant
at
org.apache.mahout.viennacl.openmp.ViennaCLSuiteOMP$$
anonfun$4.apply$mcV$sp(ViennaCLSuiteOMP.scala:152)

Post by Trevor Grant
============================================================

====================

Post by Trevor Grant
$ lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

Post by Trevor Grant
$ uname -a

Linux Bob 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo

Post by Trevor Grant
...
...
^^ 6 cores
$ sudo nvidia-smi

Compute

Post by Trevor Grant
M. |
|===============================+======================+====
==================|
| 0 GeForce GTX 750 Ti Off | 0000:01:00.0 On |
N/A |
| 29% 25C P8 1W / 38W | 307MiB / 1998MiB | 0%
Default |
+-------------------------------+----------------------+----
------------------+
+-----------------------------------------------------------
------------------+
| Processes: GPU
Memory |
| GPU PID Type Process name Usage
|
|===========================================================
==================|
| 0 1537 G ...C-EnableWebRtcEcdsa,WebRTC-H264WithOpenH2
87MiB |
| 0 1990 G /usr/lib/xorg/Xorg
101MiB |
| 0 2319 G /usr/bin/gnome-shell
104MiB |
| 0 4360 G /usr/lib/xorg/Xorg
12MiB |
+-----------------------------------------------------------
------------------+

Post by Trevor Grant
$ clinfo

(n/a)

Post by Trevor Grant
float 1 / 1
double 1 / 1
(cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 2095841280 (1.952GiB)
Error Correction support No
Max memory allocation 523960320 (499.7MiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 81920
Global Memory cache line 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 4096x4096x4096 pixels
Max number of read image args 256
Max number of write image args 16
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max constant buffer size 65536 (64KiB)
Max number of constant args 9
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 1
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics
cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) NVIDIA CUDA
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [NV]
clCreateContext(NULL, ...) [default] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in
platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices
found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found

Post by Trevor Grant
platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.8
ICD loader Profile OpenCL 1.2
NOTE: your OpenCL library declares to support OpenCL 1.2,
but it seems to support up to OpenCL 2.1 too.

Post by Trevor Grant
$ java -version

openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.

04.1-b14)

Post by Trevor Grant
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
vcl version,

Post by Trevor Grant
(Ignoring what is in /usr/include/viennacl/version.hpp)
1.7.1
$ gcc --version

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609

that

Post by Trevor Grant

Post by Trevor Grant
may be useful so that we may compare?
Thx
Andy
Sent from my Galaxy Tab A