Discussion:
[DISCUSS] Mahout Streaming Bindings
Trevor Grant
2016-12-09 16:04:35 UTC
Permalink
Wanted to kick off a discussion about if we're ready to start thinking
about coming up with some bindings for streaming engines.

Had a question at a meetup tuesday in seattle, and it got me thinking about
it.

In my mind they would be a discreet set of Bindings, Flink Streaming
/Spark Streaming / (Beam?) / etc

I have some fleeting thoughts, but didn't write any of the bindings and
have only attempted to grok them in passing.

Or maybe not something we want to pursue at this time? However, it is
something I'd be interested in tinkering with on a branch of my own...

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*
Trevor Grant
2016-12-09 17:08:18 UTC
Permalink
I was thinking, if its a thing we want to pursue, let's maybe attempt some
initial stabs- perhaps start a branch, and get a feel for how scary of a
nightmare its going to be. Then we can start talking time tables.

In my mind it will either be
1- Crazy easy, just creating a distributed context for the streaming
context and everything else kind of falls into place
2- crazy hard- nearly reinventing the wheel
3- some mixture of the two.

The upshot with the imminent GPU acceleration: we'd be the first streaming
+ gpu which would be quite dope


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*
It's that this is something that we've been talking about for a while...
would you be thinking for before 1.0 or after 1.0?
Sent from my Verizon Wireless 4G LTE smartphone
-------- Original message --------
Date: 12/09/2016 8:05 AM (GMT-08:00)
Subject: [DISCUSS] Mahout Streaming Bindings
Wanted to kick off a discussion about if we're ready to start thinking
about coming up with some bindings for streaming engines.
Had a question at a meetup tuesday in seattle, and it got me thinking about
it.
In my mind they would be a discreet set of Bindings, Flink Streaming
/Spark Streaming / (Beam?) / etc
I have some fleeting thoughts, but didn't write any of the bindings and
have only attempted to grok them in passing.
Or maybe not something we want to pursue at this time? However, it is
something I'd be interested in tinkering with on a branch of my own...
tg
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Suneel Marthi
2016-12-09 17:13:20 UTC
Permalink
-100

It doesn't make sense adding support for all of the batch and streaming
engines that are available. We presently have a H2O binding which has never
seen any use and I have long been thinking of trashing out.

It would be more productive utilization of the very limited resource time
by porting the legacy Recommender algorithms to Samsara as opposed to
adding anymore of these streaming/batch frameworks.

I have had folks asking me if we support Akka Streams, Apex etc... but most
folks don't know as to how they are gonna be using it or what is it they
are trying to accomplish.

How many Recommender Systems are out there today that need Streaming?
Amazon doesn't do Realtime streaming recommendations yet. Lambda
frameworks like Oryx 2.0, Summingbird etc.. support that and Mahout-Spark
should be pluggable as the ML engine into those frameworks.

My 2ç.
Post by Trevor Grant
Wanted to kick off a discussion about if we're ready to start thinking
about coming up with some bindings for streaming engines.
Had a question at a meetup tuesday in seattle, and it got me thinking about
it.
In my mind they would be a discreet set of Bindings, Flink Streaming
/Spark Streaming / (Beam?) / etc
I have some fleeting thoughts, but didn't write any of the bindings and
have only attempted to grok them in passing.
Or maybe not something we want to pursue at this time? However, it is
something I'd be interested in tinkering with on a branch of my own...
tg
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Saikat Kanjilal
2016-12-09 22:09:35 UTC
Permalink
Trveor,

Out of curiosity is this JIRA item related to this: https://www.mail-archive.com/***@mahout.apache.org/msg32584.html



I had wanted to help out on this and found the following: https://github.com/nativelibs4java/ScalaCL

[https://avatars1.githubusercontent.com/u/11545921?v=3&s=400]<https://github.com/nativelibs4java/ScalaCL>

GitHub - nativelibs4java/ScalaCL: ScalaCL - run Scala on ...<https://github.com/nativelibs4java/ScalaCL>
github.com
README.md ScalaCL lets you run Scala code on GPUs through OpenCL (BSD-licensed). WORK IN PROGRESS (see ScalaCL if you want something that works, albeit only on Scala ...


Any interest in looking into hooking this into mahout-scala.



________________________________
From: Trevor Grant <***@gmail.com>
Sent: Friday, December 9, 2016 8:04 AM
To: ***@mahout.apache.org
Subject: [DISCUSS] Mahout Streaming Bindings

Wanted to kick off a discussion about if we're ready to start thinking
about coming up with some bindings for streaming engines.

Had a question at a meetup tuesday in seattle, and it got me thinking about
it.

In my mind they would be a discreet set of Bindings, Flink Streaming
/Spark Streaming / (Beam?) / etc

I have some fleeting thoughts, but didn't write any of the bindings and
have only attempted to grok them in passing.

Or maybe not something we want to pursue at this time? However, it is
something I'd be interested in tinkering with on a branch of my own...

tg

Trevor Grant
Data Scientist
https://github.com/rawkintrevo
[https://avatars3.githubusercontent.com/u/5852441?v=3&s=400]<https://github.com/rawkintrevo>

rawkintrevo (Trevor Grant) · GitHub<https://github.com/rawkintrevo>
github.com
rawkintrevo has 22 repositories available. Follow their code on GitHub.



http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
[Loading Image...]<http://trevorgrant.org/>

The musings of rawkintrevo<http://trevorgrant.org/>
trevorgrant.org
Hot-rodder, opera enthusiast, mad data scientist; a man for all seasons.




*"Fortunate is he, who is able to know the causes of things." -Virgil*
Trevor Grant
2016-12-10 04:18:53 UTC
Permalink
Hey Saikat, not really.

I was opening the discussion on weather or not we wanted to pursue Mahout
on Streaming at this time.

The CL integration was just an aside.


<https://github.com/andrewpalumbo/mahout/tree/viennacl>



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things." -Virgil*
Post by Saikat Kanjilal
Trveor,
https://github.com/nativelibs4java/ScalaCL
[https://avatars1.githubusercontent.com/u/11545921?v=3&s=400]<https://
github.com/nativelibs4java/ScalaCL>
GitHub - nativelibs4java/ScalaCL: ScalaCL - run Scala on ...<
https://github.com/nativelibs4java/ScalaCL>
github.com
README.md ScalaCL lets you run Scala code on GPUs through OpenCL
(BSD-licensed). WORK IN PROGRESS (see ScalaCL if you want something that
works, albeit only on Scala ...
Any interest in looking into hooking this into mahout-scala.
________________________________
Sent: Friday, December 9, 2016 8:04 AM
Subject: [DISCUSS] Mahout Streaming Bindings
Wanted to kick off a discussion about if we're ready to start thinking
about coming up with some bindings for streaming engines.
Had a question at a meetup tuesday in seattle, and it got me thinking about
it.
In my mind they would be a discreet set of Bindings, Flink Streaming
/Spark Streaming / (Beam?) / etc
I have some fleeting thoughts, but didn't write any of the bindings and
have only attempted to grok them in passing.
Or maybe not something we want to pursue at this time? However, it is
something I'd be interested in tinkering with on a branch of my own...
tg
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
[https://avatars3.githubusercontent.com/u/5852441?v=3&s=400]<https://
github.com/rawkintrevo>
rawkintrevo (Trevor Grant) · GitHub<https://github.com/rawkintrevo>
github.com
rawkintrevo has 22 repositories available. Follow their code on GitHub.
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
[https://s0.wp.com/i/blank.jpg]<http://trevorgrant.org/>
The musings of rawkintrevo<http://trevorgrant.org/>
trevorgrant.org
Hot-rodder, opera enthusiast, mad data scientist; a man for all seasons.
*"Fortunate is he, who is able to know the causes of things." -Virgil*
Loading...