Think data first. Data is long-term, applications are temporary. I recently happened to read this in one of the blog post. I couldn’t agree more. Data remains one of the most strategic projects for most of the companies.
Every fifth person you talk to, every other start up you come across and job postings has something or other to mention about data, analytics etc. But, when I speak to the guys whoever I come across in my ecosystem, lot of guys think it is only doing cool stuff in R.
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.
If someone is an application developer for the last 10 years, can he/she suddenly become an expert in statistics and become an expert in Algorithms? Suddenly you start calling yourself a Data Scientist? May be… Nothing is impossible. But if that’s what is your passion you wouldn’t be an application developer for the last 10 years. Right?
Is there anything else one can learn and contribute in the data world? Thought of sharing couple of valuable links which can give you a very good idea on the various aspects and where one can fit in.
#1 Will Balkanization of Data Science led to one Empire or many Republics? Via http://www.kdnuggets.com/2015/11/balkanization-data-science.html
#2 Becoming a Data Scientist via http://nirvacana.com/thoughts/becoming-a-data-scientist/
#3 Difference between Data Engineering and Data Science via http://www.galvanize.com/blog/difference-between-data-engineering-and-data-science/
#4 The world of data science: Who does what in the data world? Via http://cloudtweaks.com/2015/11/booming-world-data-science/
Data is one of the hottest stack right now and it is growing at a crazy speed. It would be extremely difficult for any single individual to cope up with this change unless one’s basics are right.
Once you have the basics right, it is about Meta learning and evolving from there.
Working with various large scale data related projects for the last 15 months, following is my high level list of items one need to know to have a reasonable understanding of data (Big/Small). This list is no specific order. 😦
|General||A Basic overview of what is Descriptive, Diagnostic, Prescriptive, Predictive and Cognitive Analytics? Understanding of the concepts and difference|
|No SQL Databases||
|Real time Analytics||Working with Stream: Real time Analytics is something everyone talks about. But without understanding what it means by Stream processing you will never be able to figure out this.
From an application background
If you don’t understand the above, then it would be difficult to move forward. Spend time on these before moving forward to other items
|Machine Learning||Machine Learning
Couple of very good posts to read in this
Once you are clear with the concepts start implementing using Apache Mahout
|Modern day HOLAP Engines||
|Data Visualization||Self-Service is the Mantra here. Read this article: Data Scientists Should be Good Storytellers
“Most of the people in an organization cannot understand the outcome of analytics, however they do need the proof of analysis and data. Data storytellers incorporate data and analytics in a compelling way as their stories involve real people and organizations” via https://dzone.com/articles/data-scientists-should-be-good-storytellers
|Deep Learning||Though it may or may not be important at this point, try to understand what is deep learning. Read this : Deep Learning in a Nutshell: Core Concepts via http://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/|
|Data Lake||One of my favorite topic and something I learnt after burning my hands is with data lake
|Language||Though there is a bunch of things to do with Python, R, Java etc. My choice is Scala (I love the way the language allows you to express. Wish someone can afford me as a developer again J)|
If you have a good grasp on above, then it is time for you to figure our when to use what (Creating Solutions).
“If all you have is a hammer, everything looks like a nail”
Read this: The Ethics of Wielding an Analytical Hammer via http://sloanreview.mit.edu/article/the-ethics-of-wielding-an-analytical-hammer/
Data is having an impact on business models and profitability. It’s hard to find a non-trivial application that doesn’t use data in a significant manner ~ Ben Lorica, O’Reilly Media
Ok, this looks like a large list. Where do I start?
- Focus on the basics. Get a good overview of the ecosystem
- Decide your area of specialization.
- Focus on your specialization and build skills.
- Iterate and change course as required.
- If you are more than 10 years of experience, understand the business situation and figure out when to use what. May be pick 1 or 2 items and start implementing in your environment.
- If you are less than 10 years of experience, pick up a scenario and try to implement this and see if it makes any business sense.
What I have not covered in the list? I haven’t gone into the details of
- Hadoop Ecosystem and components (Pig/Hive etc.)
- Nearest Neighbour
- K-Means Clustering
- Linear Regression
- Decision Trees etc.
- R in detail
- Env Setup
- Zookeeper, Yarn, Mesos
- Vertical Industry Solutions
- Operational Systems (like Splunk)
- Data Governance
I keep hearing/seeing people who have never seen more than 1 GB of data saying that they do Big Data Analytics. Don’t learn or do something for the sake of doing it.
There is no short cut to a place worth going.
My favorite books on this topic.
- Head First Data Analysis
- Data Warehousing Fundamentals – A Comprehensive Guide to IT Professionals
- Big Data : Principles and Best Practices of Scalable real-time data systems
If you want to know more about what I am learning, you can follow me in Twitter