''Data Engineering'', The Best Thing Happening To Social Media Platforms?
By Kamal Jacob
Is there any one thing that is common among social media platforms like Twitter and Facebook? Yes, the enormous amount of data that they generate on a daily basis. According to this 2018 Forbes article, Facebook alone has 1.5 billion active users on a daily basis with 300 million photographs getting uploaded each day. Similarly, Instagram has a total number of 400 million active users each day posting 95 million photographs and videos.
Social media companies are not just generating large volumes of big data through content marketing and advertising but are also using a variety of business analytics tools to pull insights from the user such as age, place of location, gender, and their buying patterns. Additionally, small retail companies are using social media data to shape their online marketing campaigns. A prime example of this is the phone case manufacturer, Peel, which used the Facebook platform to register a 16x growth in their revenue.
Data Science vs Data Engineering
Whether it is for achieving business goals or foreseeing business problems, big data analytics is enabling organizations to move forward more decisively and respond better to market challenges. However, due to the high complexity of big data, companies need to recruit professionals with a high level of data expertise to implement big data-based solutions.
The growing demand (and shortage) of data specialists in fields like data science and data analysis is a reflection of the rising industry importance of the data science field.
At its core, data science is the process of efficient collection, processing, and visualization of big data. A data scientist typically needs to perform the following functions:
1. Define the business problem and ask the right questions on a given dataset.
2. Use a variety of statistical, machine learning, and data mining tools (built using R, Python, or MySQL) to get the required answers from the dataset.
3. Communicate the results of the data analysis through data visualization tools.
Even with all the potential shown by big data projects and the role of data scientists, Gartner reported that only arospecialized big data projects actually end up in actual production.
Data engineering is a specialised field that is critical for moving data science projects into production. Tomer Shiran, CEO of the big data middleware company, Dremio states that for the successful implementation of a data science project, companies will “typically take a ratio of one data engineer for every two data scientists.”
Image Source: http://community.datacamp.com.s3.amazonaws.com/community/production/ckeditor_assets/pictures/414/content_screenshot_2017-02-23_14_26_33.png
So, what is data engineering and how do data engineers differ from data scientists? In simple terms, a data engineer enables the data scientist to perform their job more efficiently by:
1. Building a data pipeline infrastructure for better data handling.
2. Improving the productivity of the data team by foreseeing production needs and removing any system bottlenecks.
3. Performing data collection and storage using various scripting languages.
4. Constructing data stores on database systems.
As compared to the intellectual knowledge of the data scientist, data engineers have more hands-on data skills that can be used to provide clean and structured data to the business. Data engineers are also more conversant with the best practices in software engineering, computer science, and database systems, along with data engineering technologies such as Hadoop and Kafka.
In the following section, we shall discuss 5 future trends of data engineering and how it is shaping business enterprises including social media companies.
Investments in Big Data
Companies that are investing in Big Data technologies must also be flexible enough to deal with constant changes in the way big data is handled and managed. Starting from Hadoop as the preferred environment, data engineering is moving towards adopting Spark or even a server-less environment in the near future. As an example, customers engagement company, Freshworks is boosting its investments in data science aimed to speed up its product development.
Improving ROI using Social Media Data
Be it for social media marketing or customer service, digital companies are increasing their presence on social media platforms to improve their ROI. A 2018 survey by Sprout Social concluded that improving the ROI on social media (or social ROI) is important for 55% of social media marketers.
In simple terms, social media data is used to show how online users are engaging with your social media content. Data engineers can use a variety of social media data including shares, hashtag usage, URL clicks, and keywords for performing data mining and analysis.
Depending on your organizational goal and the preferred social media platform for your business, Social ROI can be measured through various KPIs and metrics such as:
1. For Facebook:
a) User engagement including likes, comments, and shares
b) Impressions measuring the number of times users viewed your Facebook page
c) Organic likes generated without any online ad campaign
d) Page likes
e) Paid likes generated from a paid digital ad campaign
f) Type of reactions including Like, Love, Sad, or Angry
g) Number of unlikes
2. For Instagram:
a) Account impressions measuring the number of views for your stories
b) Total reach measuring the number of unique views of your posts
c) Website clicks and Profile visits
d) Number of likes and comments on your posts
3. For Twitter:
a) User engagement including clicks, retweets, and replies
b) Number of Twitter followers
c) Use of hashtags and @username by others
d) Number of posted and viewed tweets
4. For LinkedIn:
a) User engagement including the number of clicks on posts and company name
b) Number of new followers
c) User interactions including likes, comments, and shares
To save cost and time, an increasing number of companies are moving their big data to cloud-based platforms and solutions. However, lack of portability remains a major challenge as each of the cloud vendors use written code that is often incompatible with another vendor offering cloud services. To enable data portability, Google is offering cloud portability solutions (in association with VMWare) that allow software engineers to create and deploy web applications across multiple cloud environments. Data portability across multiple cloud environments is going to be a major enabler of data science-based solutions in the future.
How Data Engineers are using Social Media
Do data engineers use Social media platforms to gather data for their work? The findings of a 2017 survey of nearly 1200 engineers and design professionals by Engineering.com seems to suggest that the answer is “yes.”
Image Source: https://marketing.engineering.com/hs-fs/hubfs/Blog_images/20171130%2030%20Day%20Use.jpg?width=711&height=425&name=20171130%2030%20Day%20Use.jpg
41% of the surveyed engineers revealed that they collect their engineering content from social media platforms. Additionally, 39% of the surveyed professionals acquire their information from the 3 largest platforms namely Facebook, Twitter, and LinkedIn.
Another interesting trend is the growing availability of social media datasets that can be used by data specialists including data engineers. These large and free datasets are an integral requirement for data science projects and can be a reliable resource for developing a new data algorithm. An example of this is the Flickr Creative Commons dataset by Yahoo Webscope containing nearly 100 million images and 0.7 million videos. The cloud service provider, Infochimps has the Twitter Census dataset product that provides datasets derived from over 35 million tweets.
Managing a shortage of resources
Software and Big data companies are adopting innovative ways to overcome the market shortage of qualified personnel in data engineering, Retail company, Overstock, monitors its retail customer’s buying behavior through its One-to-One marketing machine that was built using a cloud-powered data analytics solution. Data scientists are taking on some of the functions performed by data engineers, as a result of which companies are looking at performing data integration in place of using data lakes.
Along with the abundance of data generated by popular social media platforms like Facebook, Twitter, and Instagram, social media data is a valuable input for data scientists and engineers in business enterprises to implement Big data projects and solutions.
This article evaluates the important role played by data engineers and how they have a different skill set as compared to that of data scientists. Additionally, this article covers some of the visible future trends of data engineering and social media data and how they are shaping modern business corporations.