Getting Started with Serverless Architecture

technology-1587673_960_720

Serverless Architecture is relatively very new. I’ve been exploring Serverless architecture for the new platform architecture off late. Though it is very interesting obviously there is a reasonable learning curve and I don’t see lot of best practices out there yet.

Everything looks green on the other side.. We will learn as we move forward..

Since, we use AWS as our cloud provider, most of the examples you will see are related to AWS Lambda.

Specific Reasons for exploring Serverless Architecture 

  1. No operating systems to choose, secure, patch, or manage.
  2. No servers to right size, monitor, or scale out.
  3. No risk to your cost by over-provisioning.
  4. No risk to your performance by under-provisioning.

https://d0.awsstatic.com/whitepapers/AWS_Serverless_Multi-Tier_Architectures.pdf

One thing i learnt in the last few years about developing distributed applications is that, it is not about learning new things… it is always about unlearning what you have done in the past.

If you are specific about Vendor lock-in then this may not be a choice at all for you…

Following is my reading list on Serverless Architecture.

What is Serverless?
https://auth0.com/blog/what-is-serverless/

Serverless Architectures
http://martinfowler.com/articles/serverless.html

What is Serverless Computing and Why is it Important?
https://www.iron.io/what-is-serverless-computing/

Serverless Architecture in short
https://specify.io/concepts/serverless-architecture

Is “Serverless” architecture just a finely-grained rebranding of PaaS?
http://www.ben-morris.com/is-serverless-architecture-just-a-finely-grained-rebranding-of-paas/

IAAS, PAAS, Serverless.
https://read.acloud.guru/iaas-paas-serverless-the-next-big-deal-in-cloud-computing-34b8198c98a2#.m9us1c5fe

Serverless Delivery: Architecture
https://stelligent.com/2016/03/17/serverless-delivery-architecture-part-1/

Principles of Serverless Architectures
There are five principles of serverless architecture that describe how an ideal serverless system should be built. Use these principles to help guide your decisions when you create serverless architecture.
1. Use a compute service to execute code on demand (no servers)
2. Write single-purpose stateless functions
3. Design push-based, event-driven pipelines
4. Create thicker, more powerful front ends
5. Embrace third-party services
https://dzone.com/articles/serverless-architectures-on-aws

Serverless Architectures – Building a Serverless system to solve a problem
https://serverless.zone/serverless-architectures-9e23af71097a#.j9z60nxw1

Serverless architecture: Driving toward autonomous operations
https://www.slalom.com/thinking/serverless-architecture

Serverless Developers
https://serverless-developers.com/

The essential guide to Serverless technologies and architectures
http://techbeacon.com/essential-guide-serverless-technologies-architectures

Using AWS Lambda and API Gateway to create a serverless schedule
https://www.import.io/post/using-amazon-lambda-and-api-gateway/

Five Reasons to Consider Amazon API Gateway for Your Next Microservices Project
http://thenewstack.io/five-reasons-to-consider-amazon-api-gateway-for-your-next-microservices-project/

AWS Lambda and the Evolution of the Cloud
https://blog.fugue.co/2016-01-31-aws-lambda-and-the-evolution-of-the-cloud.html

SquirrelBin: A Serverless Microservice Using AWS Lambda
https://aws.amazon.com/blogs/compute/the-squirrelbin-architecture-a-serverless-microservice-using-aws-lambda/

A Crash Course in Amazon Serverless Architecture
http://cloudacademy.com/blog/amazon-serverless-api-gateway-lambda-cloudfront-s3/
­
AWS Lambda and Endless Serverless Possibilities
https://abhishek-tiwari.com/post/aws-lambda-and-endless-serverless-possibilities

Awesome Serverless – A Curated List
https://github.com/JustServerless/awesome-serverless

Happy Learning!

Data Infrastructure, Data Pipeline and Analytics – Reading List – Sep 27, 2016

Splunk vs ELK: The Log Management Tools Decision Making Guide
Much like promises made by politicians during an election campaign, production environments produce massive files filled with endless lines of text in the form of log files. Unlike election periods, they’re doing it all year around, with multiple GBs of unstructured plain text data generated each day.
http://blog.takipi.com/splunk-vs-elk-the-log-management-tools-decision-making-guide/

Building a Modern Bank Backend
https://monzo.com/blog/2016/09/19/building-a-modern-bank-backend/

An awesome list of Micro Services Architecture related principles and technologies.
https://github.com/mfornos/awesome-microservices#api-gateways–edge-services

Stream-based Architecture
Part of the Stream Architecture Book. An excellent overview on the topic.
https://www.mapr.com/ebooks/streaming-architecture/chapter-02-stream-based-architecture.html

The Hardest Part About Micro services: Your Data
Of the reasons we attempt a micro services architecture, chief among them is allowing your teams to be able to work on different parts of the system at different speeds with minimal impact across teams. So we want teams to be autonomous, capable of making decisions about how to best implement and operate their services, and free to make changes as quickly as the business may desire. If we have our teams organized to do this, then the reflection in our systems architecture will begin to evolve into something that looks like micro services.
http://blog.christianposta.com/microservices/the-hardest-part-about-microservices-data/

New Ways to Discover and Use Alexa Skills
Alexa, Amazon’s cloud-based voice service, powers voice experiences on millions of devices, including Amazon Echo and Echo Dot, Amazon Tap, Amazon Fire TV devices, and devices like Triby that use the Alexa Voice Service. One year ago, Amazon opened up Alexa to developers, enabling you to build Alexa skills with the Alexa Skills Kit and integrate Alexa into your own products with the Alexa Voice Service.
http://www.allthingsdistributed.com/2016/06/new-ways-to-discover-and-use-alexa-skills.html

Happy Learning!

Developing a Robust Data Platform : Key Considerations

key-considerations

Developing a robust data platform requires definitely more than HDFS, Hive, Sqoop and Pig. Today there is a real need for bringing data and compute as close as possible. More and more requirements are forcing us to deal with high-throughput/low-latency scenarios. Thanks to in-memory solutions, things definitely seems possible right now.

One of the lesson I have learnt in the last few years is that it is hard to resist developing your own technology infrastructure while developing a platform infrastructure. It is always important to remind ourselves that we are here to build solutions and not technology infrastructure.

Some of the key questions that needs to be considered while embarking on such journey is that

  1. How do we handle the ever growing volume of data (Data Repository)?
  2. How do we deal with the growing variety of data (Polyglot Persistence)?
  3. How do we ingest large volumes of data as we start growing (Ingestion Pipelines/Write Efficient)?
  4. How do we scale in-terms of faster data retrieval so that the Analytics engine can provide something meaningful at a decent pace?
  5. How do we deal with the need for Interactive Analytics with a large dataset?
  6. How do we keep our cost per terabyte low while taking care of our platform growth?
  7. How do we move data securely between on premise infrastructure to cloud infrastructure?
  8. How do we handle data governance, data lineage, data quality?
  9. What kind of monitoring infrastructure that would be required to support distributed processing?
  10. How do we model metadata so that we can address domain specific problems?
  11. How do we test this infrastructure? What kind of automation is required?
  12. How do we create a service delivery platform for build and deployment?

One of the challenges I am seeing right now is that the urge to use multiple technologies to solve similar problems.  Though this gives my developers the edge to do things differently/efficiently, from a platform perspective this would increase the total cost of operations.

  1. How do we support our customers in production?
  2. How can we make the life our operations teams better?
  3. How do we take care of reliability, durability, scalability, extensibility and Maintainability of this platform?

Will talk about the data repository and possible choices in the next post.

Happy Learning!

“Data is long-term, Applications are temporary.”

Think data first. Data is long-term, applications are temporary. I recently happened to read this in one of the blog post. I couldn’t agree more. Data remains one of the most strategic projects for most of the companies.

Every fifth person you talk to, every other start up you come across and job postings has something or other to mention about data, analytics etc. But, when I speak to the guys whoever I come across in my ecosystem, lot of guys think it is only doing cool stuff in R.

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.

If someone is an application developer for the last 10 years, can he/she suddenly become an expert in statistics and become an expert in Algorithms? Suddenly you start calling yourself a Data Scientist? May be… Nothing is impossible. But if that’s what is your passion you wouldn’t be an application developer for the last 10 years. Right?

Is there anything else one can learn and contribute in the data world? Thought of sharing couple of valuable links which can give you a very good idea on the various aspects and where one can fit in.

#1 Will Balkanization of Data Science led to one Empire or many Republics? Via http://www.kdnuggets.com/2015/11/balkanization-data-science.html
#2 Becoming a Data Scientist via http://nirvacana.com/thoughts/becoming-a-data-scientist/
#3 Difference between Data Engineering and Data Science via http://www.galvanize.com/blog/difference-between-data-engineering-and-data-science/
#4 The world of data science: Who does what in the data world? Via http://cloudtweaks.com/2015/11/booming-world-data-science/matrix-1013612_640

Data is one of the hottest stack right now and it is growing at a crazy speed. It would be extremely difficult for any single individual to cope up with this change unless one’s basics are right.

Once you have the basics right, it is about Meta learning and evolving from there.

Working with various large scale data related projects for the last 15 months, following is my high level list of items one need to know to have a reasonable understanding of data (Big/Small). This list is no specific order. 😦

General A Basic overview of what is Descriptive, Diagnostic, Prescriptive, Predictive and Cognitive Analytics? Understanding of the concepts and difference
Data Warehouses
  • OLAP VS OLTP
  • Dimensional Modelling (Star Schemas, Snowflake Schemas)
  • Difference between Multi-Dimensional, Relational, Hybrid
  • In-Memory OLAP
No SQL Databases
  • CAP Theorem
  • If you are from application development, this is where the most important change would be. So far, you would have dealt primarily with Key-Value stores and Document Stores. For Analytics purpose (Write Efficient), it is important to start understanding column databases (E.g.: Cassandra) and Graph (E.g.:Neo4J). This is again a big shift from what you would have done as an application developer. Spend some time on it.
  • In-Memory databases in general.
  • Apart from Cassandra and Neo4J, get an understanding of what MemSQL offers. Yes, it is MemSQL and not MySQL J seems very impressive.
Outside EDWs
  • MPPs/PDWs – Difference between traditional EDWs and MPPs?
  • DWH on cloud AWS Redshift, Azure SQL Data Warehouse
Data Mining
  • What does it mean?
  • Data Mining Algorithms
Hadoop
  • Hadoop and Various Hadoop Components
  • When to use Hadoop?
  • Parallelization and Map Reduce Fundamentals
Outside Hadoop
  • Difference between Hadoop, Spark and Storm (I personally prefer SPARK. RDDs give me the same comfort what I had with ADO.NET)
  • When to use Hadoop/Spark/Storm over MPP?
ETL
  • Data Munging/Wrangling
  • Scrubbing
  • Transforming
  • Reading and Loading Data
  • Exception Handling
  • Jobs/Tasks
Real time Analytics Working with Stream: Real time Analytics is something everyone talks about. But without understanding what it means by Stream processing you will never be able to figure out this.
From an application background

  • Reactive Architecture (Responsive, Resilient, Elastic and Message driven)
  • Understand the difference between an Event and a Transaction.
  • Event Processing(CQRS, Actor Model[Akka], Complex Event Processing)

If you don’t understand the above, then it would be difficult to move forward. Spend time on these before moving forward to other items
Messaging/Data bus

  • Kafka

Processing Streams

  • Spark/Storm

Lambda Architecture

Machine Learning Machine Learning

  • Difference between Data Mining and Machine Learning
  • ML Algorithms

Couple of very good posts to read in this
Machine Learning for Programmers: Leap from developer to machine learning practitioner via http://machinelearningmastery.com/machine-learning-for-programmers/
What Every Manager Should Know About Machine Learning via https://hbr.org/2015/07/what-every-manager-should-know-about-machine-learning
Most of what we are doing can be achieved at some level using Excel Analytics Data Pack. In fact, I would say Excel is the most powerful tool out there.

Recommendation Engines
  • Collaborative Filtering
  • Content-based Filtering
  • Hybrid

Once you are clear with the concepts start implementing using Apache Mahout

Communication Protocols
  • JSON, AVRO, Protocol Buffer, and Thrift: If you are from application development – you would have used JSON extensively. It is time to understand the other ones as well. I keep arguing this with my friend Sendhil (IMO, AVRO seems to be the way to go – where things are evolving and need for self-documentation – Cowboys Friendly).
Time Series
  • Modelling
  • Databases (OpenTSDB)
  • Forecasting
  • Trend Analysis
Modern day HOLAP Engines
  • Apache Kylin (My favourite at this point)
Data Visualization Self-Service is the Mantra here. Read this article: Data Scientists Should be Good Storytellers

Most of the people in an organization cannot understand the outcome of analytics, however they do need the proof of analysis and data. Data storytellers incorporate data and analytics in a compelling way as their stories involve real people and organizations” via https://dzone.com/articles/data-scientists-should-be-good-storytellers

  • How to represent data (Graphs/Charts)?
  • Excel Power Pivot/ Power BI (Polybase)
  • Lumira
  • D3.js
Deep Learning Though it may or may not be important at this point, try to understand what is deep learning. Read this : Deep Learning in a Nutshell: Core Concepts via http://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/
Data Lake One of my favorite topic and something I learnt after burning my hands is with data lake

  • Understand what Data Lakes mean? Why do you need one? How to build a data lake on your own?
  • Extract Load and Transform (ELT)
  • ELT vs ETL

Read this: https://azure.microsoft.com/en-in/solutions/data-lake/

Language Though there is a bunch of things to do with Python, R, Java etc. My choice is Scala (I love the way the language allows you to express. Wish someone can afford me as a developer again J)

If you have a good grasp on above, then it is time for you to figure our when to use what (Creating Solutions).

 “If all you have is a hammer, everything looks like a nail”

Read this:  The Ethics of Wielding an Analytical Hammer via http://sloanreview.mit.edu/article/the-ethics-of-wielding-an-analytical-hammer/

Data is having an impact on business models and profitability. It’s hard to find a non-trivial application that doesn’t use data in a significant manner ~ Ben Lorica, O’Reilly Media

Ok, this looks like a large list. Where do I start?

  1. Focus on the basics. Get a good overview of the ecosystem
  2. Decide your area of specialization.
  3. Focus on your specialization and build skills.
  4. Iterate and change course as required.
  • If you are more than 10 years of experience, understand the business situation and figure out when to use what. May be pick 1 or 2 items and start implementing in your environment.
  • If you are less than 10 years of experience, pick up a scenario and try to implement this and see if it makes any business sense.

What I have not covered in the list? I haven’t gone into the details of

  1. Hadoop Ecosystem and components (Pig/Hive etc.)
  2. Algorithms
    1. Nearest Neighbour
    2. K-Means Clustering
    3. Linear Regression
    4. Decision Trees etc.
  3. R in detail
  4. Infrastructure
    1. Env Setup
    2. Zookeeper, Yarn, Mesos
    3. Replication
  5. Vertical Industry Solutions
  6. Operational Systems (like Splunk)
  7. Data Governance

I keep hearing/seeing people who have never seen more than 1 GB of data saying that they do Big Data Analytics. Don’t learn or do something for the sake of doing it.

There is no short cut to a place worth going.

My favorite books on this topic.

If you want to know more about what I am learning, you can follow me in Twitter

Happy Learning!

How long did it take to build ….?

I had a wonderful opportunity to listen to and Interact with Hubert Smits couple of weeks back. One of the questions he asked during the interaction is “how long do you think it took to build empire state building (the tallest one when it was built in 1930s)”? There were answers like 10 years, 7 years, 20 years etc.

Empire State
He replied saying 400 odd days with 3400 workers… Immediately the follow up question was to complete the architecture and design? He said “No…” to complete the entire building.

I couldnt believe it immediately. Went and searched in Google and found links confirming the same.

http://history1900s.about.com/od/1930s/a/empirefacts.htm
http://answers.yahoo.com/question/index?qid=20070510175905AAyRbD9
http://en.wikipedia.org/wiki/Empire_State_Building

Some of the things which i could take as a lesson from this
1. It requires meticulous planning. I am not talking about creating a plan here… Continuous Planning.

2. The architect had previous experience in building something of this sort. But again, every project is unique in nature.. The risk management capabilities of this construction was beyond what we can imagine(identify, monitor, mitigate and track).
Women

3. The architects produced the initial design in 2 weeks, based on their previous experience. But, they would have refined it continuously as they built every floor.

4. Though the architects had previous experience, they would have ended up in new challenges every time. The design and development would have gone through iterations to address these challenges and not based on a big design upfront.

What this confirms is that even an industry which is considered to be the oldest, builds things iteratively and not based on one single plan. Hmm… now we are almost near end of 2013 and we work in a modern industry. But these practices are still @ a level where people only talk about it in most of the places.

Now… i will come back to my favorite place. May be after 50 years, someone might be writing about Belandur flyover. How long do you think it has taken to complete the incomplete Belandur flyover?

Happy Learning!!!

Image courtesy of

vitasamb2001 / FreeDigitalPhotos.net
stockimages/ FreeDigitalPhotos.net

Architecture is always Context Driven

Quite often, you hear people saying that if i implement this newer technology, how do i support older browsers, operating systems etc…

An example could be: I want to implement some real time support in my web application. Solution i could think of is to use something like Socket.IO. The good part is it will bring in support which my users will love. Bad part is older browsers may not support this.

Obviously, there is no straightforward answer.

Architecture or Technology decisions are always derived from Business Strategies.

In this case, the questions Business should answer is
1. Who are my target audience?
2. What is the percentage of users use those existing browsers/operating Systems?
3. What is the budget/resources available to implement?
4. Will i be able to support these on a on going basis?
5. Will i be able to say my users that my product will support only these browsers?

The Solution to the above problem is to bring in Alternative technology implementations by detecting the browser capabilities for the older browsers. Obviously implementing an alternate solution brings in more lines of code, which means more maintenance headaches.

Its the Business’ responsibility to make a decision whether the money we get based on supporting older versions of browsers are important or Maintainability of the code. You cannot possibly have both with limited resources and budget.

Architecture is always based on the Context. Context can only come from Business.

Happy Learning!!!

Image courtesy of winnond / FreeDigitalPhotos.net

Maintainable Code: Where did we go wrong?

Every Product, when started has this as a main objective. The code has to be maintainable. But, few years down the line if you look at the code base, you will be really wondering where did we go wrong?

Over the last 6 years, following is what i have seen as a pattern.

  • Most of the applications had a great solution structure. A Well thought out BO, DAO, DTO/VO, Services and UI Projects.
  • Reasonable implementation guidance (unit tests, documentation etc.. etc..) on how the code works, how to create something new.

Image: dan / FreeDigitalPhotos.net

Great, But what about User Interface?
I guess, this is the place where most of the projects go wrong. Most of the good technical people with whom i worked with never consider UI as an important piece. So, typically you dont get to see implementation samples, documentation etc.. Obvious, Not much unit tests either. The effort which was put in creating a good solution goes for a toss as you start adding more developers in the team. People start hacking, logic leaks, not enough understanding of why things are done in a specific way, all this adds up to a big list of technical debts.

And finally comes MVC and Client side programming (JavaScript). Everybody talks about MVC, but not sure if most of the modern day developers understand why there is this M, V and C. Then comes this <%=%>. This is not only in the Microsoft community. I have seen best of the iOS developers struggling with this.

What should we start do differently?
1. Emphasize the importance of clean code in UI from day one.
2. Unit Tests, Unit Tests, Unit Tests…. Without this your code will be very hard to maintain after certain period.
3. Always start any project with sample implementations. Use that as a Reference Project and keep updating your reference project. New developers has to go through the Reference project, understand and follow.
4. Review your code base regularly (not just stop at services layer). Though this is a significant effort and need a very passionate person, doing this will definitely help in long term.
5. Coach your team members on the project architecture and design on a regular basis. Remind them about the technology vision during your  iteration planning every time (if at all if you have one :)).

Off late, Quick turn around/going to market quickly has become the selling point for most of the companies. Unless all the 7-ities are taken care in your project, it will be a nightmare to maintain the product for the next 10 years.

Happy Learning!!!