← Back to All Episodes
Episode 18

Social Media Clarity Episode

The Social Media Clarity Podcast

15 minutes of concentrated analysis and advice about social media in platform and product design

🎧 Listen to this episode:

Download MP3

Social is Big Data: Get Scala! - Alexy Khrabrov - Episode 18

image
image

After a greeting from @David Spinks and updating our previous episodes about Huffington Post comments and Facebook reach, Scott and Marc speak with Alexy Khrabrov from By The Bay (http://bythebay.iointroducing the Scala programming language for large social datasets.

Facebook responds to organic reach claims

HuffPo switches Commenting System to Facebook

Interview Links

Transcript

David: Hi I’m David Spinx, I’m the founder of Feast and CMX Summit and this is a Social Media Clarity Podcast.

Alexy:If you work with social media and you want to play and see what you can do is beyond what you can get as a package tool, Scala should be your choice. I would advocate that over Ruby or Python because Scala gives you axis of scale. Whatever you can try on small scale it’s very easy to do it big scale and it’s as easy as those other languages. Randy: Welcome to the Social Media Clarity Podcast, 15 minutes of concentrated analysis and advice about social media in platform and product design.

Scott:Welcome to episode 18 of the Social Media Clarity Podcast, I’m Scott Moore.

Marc:And I’m Marc Smith.

Alexy:And I’m Alexy Khrabrov.


Scott:For this episodes news we have two items for you:

Facebook response to organic reach claims

A mere two episodes ago, in Episode 16, we discussed concerns online marketers were voicing over changes in Facebooks organic reach.

Well, Facebook has just responded to those concerns in a post by Brian Boland who leads the Ads Product Marketing team at Facebook. He states that organic reach is declining because more content is being created every day but and there is a practical limit to what can be shown to a Facebook user through their newsfeed.

It’s a simple equation. The newsfeed can only present so many items to a user during their session on facebook and how much is presented is pretty much fixed. Facebook claims it’s about 300 items. The more content being produced by friends, family, coworkers, groups, and pages, simply means more content that won’t make it into that top 300.

According to Facebook, pages that publish great content, something that adds value to their lives, or as Justin Isaf says, “Makes them more awesome” can still make it into that top 300. And, of course, the lead of ad products is going to recommend backing that great content with smart use of ads. And it makes sense if you are trying to reach an audience.

Increasingly, expect platforms to tax your access to audience while, at the same time, they make it easier for their native relationship-based communities to thrive.

Which leads us back to the points regarding engagement and impact we made in Episode 16:

HuffPo switches Commenting System to Facebook

Also recent news is that Huffington Post has completely switched it’s US site to the facebook commenting plugin.

Back in September of 2013, Huffington Post announced they would no longer allow anonymous accounts and a form of identity verification would be required.

This became the topic of Episodes 2 and 3 of this very podcast where Randy, Bryce, and Marc laid out the reasons why anonymity and pseudo-anonymity could be both preserved and maintain quality contributions to a community.

Now, as of June 2nd 2014, Huffington Post has abandoned their in-house commenting and identity system and have switched their US-site completely over to the Facebook comments plug-in.

Otto Toth, CTO of the Huffington Post couches this move as moving the conversation to where their community is already having it. Toth did not explore the decision behind the change any further in his post.

Amanda Schumacher, Head of Communications for Huffington Post did follow up with the Nieman Journalism Lab at Harvard, saying, quote, “initial Facebook commenting tests across a selection of our verticals did not show a significant impact to the amount or quality of comments,” unquote. This change is currently only for the US site, though Huffington Post Canada and Huffington Post UK may likewise switch in the near term, according to Schumacher.

Clearly, Huffington Post is happy with their business decision though in the landscape of online news and journal sites, there is no clear consensus with sites switching to facebook comments, switching away from facebook comments, or abandoning comments altogether.


Marc:With us today is Alexy Khrabrov. Alexy is a start-up founder of By the Bay, an organization devoted to training software developers with the latest development technologies for big data analysis. He’s a big fan of Scala, the programming language, and he seeks to work on engineering problems that actually address some of the hard problems of humanity. Of course his technology of choice is scalable functional programming and state of the art computer science. He focuses on big data Scala systems which include Hadoop, Spark and Kafka based flows where Scala is already either inside, as in Spark, or added as in DSL, such as in Scoobi or Scalding. He basically built large scale map reduce systems using big data tools built out of Scala. We’re going to ask him today about his latest work and the By the Bay organization creating learning opportunities for people seeking to develop skills in big data analysis.

Scott:Alexy, tell us about Scala.

Alexy:Scala is a programming language and it’s kind of thought as a successor of Java, so it’s a very powerful language, on the one hand, it draws on the Java platform. Anywhere you can run Java you can run Scala. Java underpins most of the big data companies, it’s LinkedIn, it’s Twitter, it’s any large bank most likely runs on Java. Scala comes in and it looks more like Ruby or Python. It’s much more appealing. It’s much more concise so folks who can not tolerate the verbosity of Java or it’s clunkiness can actually enjoy Scala much better. It’s also interactive so their is a prompt, such as in Python or Ruby and you can experiment and you can play with your dates. You can do a session on anything, you can do in dynamic languages, such as Python or Ruby, but you have access now to the power of the Java platform. You also can very easily work with big data so if your data set is stored on a cluster of hundreds of computers you can actually have tools on Scala to work with this. It’s the language of choice for a lot of companies which process social media data. Twitter is almost entirely written in Scala now so that’s a thing that speaks to the power of Scala and Scala is one of the reasons Twitter is so successful.

Scott:So Twitter is completely programmed in Scala?

Alexy:The backend of it programmed in Scala. Twitter start as Ruby-on-Rails implementation and very quickly it led to Twitter hitting the limits of scalability so there was something called Fail Whale, the whale carried by a lot of birds which was showing when Twitter experiences periods of overloading and everybody would check their tweets after lunch, in the early days. Once it was fully reengineered using Scala the Fail Whale disappeared. So, Scala was one of the languages they adopted early and a lot of systems on Twitter are written in Scala. So, The backhand, the data mining tools which find, for instance the recommendations of people you should follow. All of these tools are written in Scala.

Scott:So Scala is more than just scaled data delivery, there’s analytics to it also?

Alexy:Scala is a general purpose program language. You can write anything, but because it was adopted by big data engineers and data scientists they wrote a bunch of things such as the highly scalable API. Can we now do data mining and find what are the common words people are using? All these things are written as libraries and in Scala there are systems you can take from open source and you can use them to analyze text and to do the API’S and to do a lot of services which are needed by data mining companies.

Scott:What would you say people who are interested in social media or in helping others getting together on-line, what’s the key take-away about Scala that we should know?

Alexy:In the next few years, no matter which tool you use as a social media manager, if you touch a bunch of tools, it’s very likely that underneath they talk to a system written in Scala or the tool itself is written in Scala. Ultimately, if you talk to Twitter you are interacting with a giant system written in Scala. Most of the software that Twitter writes is open source. They put it quickly on Github and they docfu [provide documentation] themselves with their own open source.

So, if you are interested in trying various data science coding tools, you can very easily do processing of this data yourself, if you want to experiment. If you want to interact with one of these API’s you can easily get some templates written in Scala to work with them. You can integrate various libraries, if you want to analyze text in your tweets there is a huge collection of natural language processing libraries written in Java and Scala; they are inter-operable. So it’s very easy for us for instance to try to detect sentiment, to look for specific words, collocations, see what are the typical words person X is using versus person Y. It’s surprisingly easy to experiment with this using an interactive prompts. I would suggest that if you work with social media, which is a stream of text and images, and you want to play and see what can you do is beyond what you can get as a packaged tool, Scala should be your choice. I would advocate that over Ruby or Python because Scala gives you access at scale. Whatever you can try on small scales, it’s very easy to do at big scale and it’s as easy as those other languages.

Scott:Perfect, because that was going to be my question about Python. How would somebody get started with Scala?

Alexy:Scala has many levels and the core of it is as easy as Python or Ruby. So, they are very basic things. There are multiple tutorials on-line. One of the things we are doing around second Bay Area Scala conference in August, called Scala by the Bay, which is preceded by three day training, Foundations of Scala. So any person who wants to use big data, small data, who wants to do computational linguistics or API’s can benefit by taking the basics of Scala courses we are going to run for three days in August. That will be repeated throughout the year.

I would suggest also going to meet-ups. I’m also a founder and organizer of SF Scala which regularly meets in the city. We meet once a month or more and we often feature talks about foundations of Scala, various tools, various libraries. I would highly recommend going to a SF Scala meet-up, joining and seeing what the events are.

Marc:Tell us about the conference.

Alexy:Scala by the Bay is the second Bay Area Scala conference. The last three years it started it was called FunConf or Silicon Valley Scala Symposium. We renamed it Scala By the Bay to showcase our beautiful surroundings. It will be hosted at Fort Mason, the historic location with the view of the Golden Gate Bridge and will feature the bay in all it’s glory. It will be about 300 developers coming from all over the world to two days of talks and we’ll have multiple companies using Scala featured.

One of the mains companies is Twitter, the king of social media. As I was talking to Scott before, most of Twitter is written in Scala, certainly all the data processing parts so it’s touching all the pieces of data you see on your Twitter. It’s originated and was touched by some Scala system somewhere and the logic of Scala and it’s libraries and use cases actually connects through this area. For instance Twitter writes all it’s recommender systems, recommending whom to follow, in Scala. It processes a large amount of Tweets which need to be searched and indexed in Scala and all of this information is stored on clusters of computers. Basically all of this, billions of tweets, needs to be scattered across hundreds of thousands of nodes and if you look for a very specific keyword or hash-tag they have to very quickly be found. A lot of systems answer this questions; how to count very quickly, who has more mentions of one word versus the other, you need to use specific mathematic systems to actually be able to answer this on a scale of billions of tweets.

We’ll have talks from LinkedIn, another company using Scala heavily. Foursquare is written entirely in Scala, so a lot of very interesting companies in social media space are using Scala. We will have talks for the beginners. We will have talks about the API’s which, for instance, let you you learn how to easily write your own API or clone an existing API using Scala clients. We will have training preceding the conference, training folks in the foundations of Scala and we’ll have some training after the conference using the Play frame-work, how to write applications using Scala and code API’s using Scala. We’ll have big data Scala track using Spark, it’s an advanced system out of Berkley. It now takes the big data world by storm and in a few years, basically Spark will be the big data tool. It will be Hadoop, but I think it will be mostly Spark inside. Basically, a lot of topics that I think of interest to folks who look at social media data on the Internet.

Marc:If we want to follow you on the web, where do we find you on twitter and where would I find good resources to learn more about Scala?

Alexy:My Twitter handle is my last name, Khrabrov K-H-R-A-B-R-O-V and the company organizing the conference is called By the Bay, LLC. So, if you go to BytheBay.io you will find various links for instance BytheBay.training provides some fundamental training in Scala and Spark. You will see links to the conference itself, which is Scalabythebay.org. There is a SF Scala meetup, if you just look for SF Scala on Google you will find it, it’s meetup.com/SF-Scala and meetup gathers regularly so it’s a very good idea to join the meet up and find out firsthand what Scala is all about, what the community is all about, and follow Scala events.

Scott:Thanks so much for joining us, Alexy.

Alexy:Thanks for having me.

Randy:For links, transcripts, and more episodes, go to socialmediaclarity.net. Thanks for listening.