Getting Started with Serverless Architecture

technology-1587673_960_720

Serverless Architecture is relatively very new. I’ve been exploring Serverless architecture for the new platform architecture off late. Though it is very interesting obviously there is a reasonable learning curve and I don’t see lot of best practices out there yet.

Everything looks green on the other side.. We will learn as we move forward..

Since, we use AWS as our cloud provider, most of the examples you will see are related to AWS Lambda.

Specific Reasons for exploring Serverless Architecture 

  1. No operating systems to choose, secure, patch, or manage.
  2. No servers to right size, monitor, or scale out.
  3. No risk to your cost by over-provisioning.
  4. No risk to your performance by under-provisioning.

https://d0.awsstatic.com/whitepapers/AWS_Serverless_Multi-Tier_Architectures.pdf

One thing i learnt in the last few years about developing distributed applications is that, it is not about learning new things… it is always about unlearning what you have done in the past.

If you are specific about Vendor lock-in then this may not be a choice at all for you…

Following is my reading list on Serverless Architecture.

What is Serverless?
https://auth0.com/blog/what-is-serverless/

Serverless Architectures
http://martinfowler.com/articles/serverless.html

What is Serverless Computing and Why is it Important?
https://www.iron.io/what-is-serverless-computing/

Serverless Architecture in short
https://specify.io/concepts/serverless-architecture

Is “Serverless” architecture just a finely-grained rebranding of PaaS?
http://www.ben-morris.com/is-serverless-architecture-just-a-finely-grained-rebranding-of-paas/

IAAS, PAAS, Serverless.
https://read.acloud.guru/iaas-paas-serverless-the-next-big-deal-in-cloud-computing-34b8198c98a2#.m9us1c5fe

Serverless Delivery: Architecture
https://stelligent.com/2016/03/17/serverless-delivery-architecture-part-1/

Principles of Serverless Architectures
There are five principles of serverless architecture that describe how an ideal serverless system should be built. Use these principles to help guide your decisions when you create serverless architecture.
1. Use a compute service to execute code on demand (no servers)
2. Write single-purpose stateless functions
3. Design push-based, event-driven pipelines
4. Create thicker, more powerful front ends
5. Embrace third-party services
https://dzone.com/articles/serverless-architectures-on-aws

Serverless Architectures – Building a Serverless system to solve a problem
https://serverless.zone/serverless-architectures-9e23af71097a#.j9z60nxw1

Serverless architecture: Driving toward autonomous operations
https://www.slalom.com/thinking/serverless-architecture

Serverless Developers
https://serverless-developers.com/

The essential guide to Serverless technologies and architectures
http://techbeacon.com/essential-guide-serverless-technologies-architectures

Using AWS Lambda and API Gateway to create a serverless schedule
https://www.import.io/post/using-amazon-lambda-and-api-gateway/

Five Reasons to Consider Amazon API Gateway for Your Next Microservices Project
http://thenewstack.io/five-reasons-to-consider-amazon-api-gateway-for-your-next-microservices-project/

AWS Lambda and the Evolution of the Cloud
https://blog.fugue.co/2016-01-31-aws-lambda-and-the-evolution-of-the-cloud.html

SquirrelBin: A Serverless Microservice Using AWS Lambda
https://aws.amazon.com/blogs/compute/the-squirrelbin-architecture-a-serverless-microservice-using-aws-lambda/

A Crash Course in Amazon Serverless Architecture
http://cloudacademy.com/blog/amazon-serverless-api-gateway-lambda-cloudfront-s3/
­
AWS Lambda and Endless Serverless Possibilities
https://abhishek-tiwari.com/post/aws-lambda-and-endless-serverless-possibilities

Awesome Serverless – A Curated List
https://github.com/JustServerless/awesome-serverless

Happy Learning!

Developing a Robust Data Platform : Key Considerations

key-considerations

Developing a robust data platform requires definitely more than HDFS, Hive, Sqoop and Pig. Today there is a real need for bringing data and compute as close as possible. More and more requirements are forcing us to deal with high-throughput/low-latency scenarios. Thanks to in-memory solutions, things definitely seems possible right now.

One of the lesson I have learnt in the last few years is that it is hard to resist developing your own technology infrastructure while developing a platform infrastructure. It is always important to remind ourselves that we are here to build solutions and not technology infrastructure.

Some of the key questions that needs to be considered while embarking on such journey is that

  1. How do we handle the ever growing volume of data (Data Repository)?
  2. How do we deal with the growing variety of data (Polyglot Persistence)?
  3. How do we ingest large volumes of data as we start growing (Ingestion Pipelines/Write Efficient)?
  4. How do we scale in-terms of faster data retrieval so that the Analytics engine can provide something meaningful at a decent pace?
  5. How do we deal with the need for Interactive Analytics with a large dataset?
  6. How do we keep our cost per terabyte low while taking care of our platform growth?
  7. How do we move data securely between on premise infrastructure to cloud infrastructure?
  8. How do we handle data governance, data lineage, data quality?
  9. What kind of monitoring infrastructure that would be required to support distributed processing?
  10. How do we model metadata so that we can address domain specific problems?
  11. How do we test this infrastructure? What kind of automation is required?
  12. How do we create a service delivery platform for build and deployment?

One of the challenges I am seeing right now is that the urge to use multiple technologies to solve similar problems.  Though this gives my developers the edge to do things differently/efficiently, from a platform perspective this would increase the total cost of operations.

  1. How do we support our customers in production?
  2. How can we make the life our operations teams better?
  3. How do we take care of reliability, durability, scalability, extensibility and Maintainability of this platform?

Will talk about the data repository and possible choices in the next post.

Happy Learning!

Scaling data operations with in-memory OLTP

Data has become the center of our universe in modern digital world. Applications are designed to store and collect more and more data. Companies are looking to integrate and analyse the data to generate insights and take actions.

Data is a precious thing and will last longer than the systems themselves ~ Tim Berners-Lee

Can an existing relational database scale with high ingestion rates, improved read performance?Database

In-Memory OLTP seems to be the direction forward. This is considering your existing technology investments. Of course if the company is open to change technology there would be more options.

Found couple of very good articles posts related to SQL Server in-memory OLTP. Looks like SQL Server 2016 has fixes to most of the issues with in-memory OLTP.

I just think it is an amazing technology and if we can use it in the right way, will definitely yield great results for your customers.

Introducing SQL Server In-Memory OLTP
https://msdn.microsoft.com/en-in/library/dn133186.aspx
https://www.simple-talk.com/sql/learn-sql-server/introducing-sql-server-in-memory-oltp/
http://blog.sqlauthority.com/2014/08/08/sql-server-introduction-to-sql-server-2014-in-memory-oltp/

The Use Cases for SQL Server 2014 In-Memory OLTP
http://sqlturbo.com/the-use-cases-for-sql-server-2014-in-memory-oltp/

SQL Server In-Memory OLTP Internals Overview
https://msdn.microsoft.com/en-us/library/dn720242.aspx

The Promise – and the Pitfalls – of In-Memory OLTP
https://www.simple-talk.com/sql/performance/the-promise—and-the-pitfalls—of-in-memory-oltp/
https://msdn.microsoft.com/en-us/library/dn246937.aspx

SQL Server 2016 : In-Memory OLTP Enhancements
http://sqlperformance.com/2015/11/sql-server-2016/in-memory-oltp-enhancements

Speeding up Business Analytics Using In-Memory Technology
https://blogs.technet.microsoft.com/dataplatforminsider/2015/12/08/speeding-up-business-analytics-using-in-memory-technology/

Dynamic Data Masking in SQL Server 2016
http://www.codeproject.com/Articles/1084808/Dynamic-Data-Masking-in-SQL-Server
https://blogs.technet.microsoft.com/dataplatforminsider/2016/01/25/use-dynamic-data-masking-to-obfuscate-your-sensitive-data/

Happy Learning!

“Data is long-term, Applications are temporary.”

Think data first. Data is long-term, applications are temporary. I recently happened to read this in one of the blog post. I couldn’t agree more. Data remains one of the most strategic projects for most of the companies.

Every fifth person you talk to, every other start up you come across and job postings has something or other to mention about data, analytics etc. But, when I speak to the guys whoever I come across in my ecosystem, lot of guys think it is only doing cool stuff in R.

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.

If someone is an application developer for the last 10 years, can he/she suddenly become an expert in statistics and become an expert in Algorithms? Suddenly you start calling yourself a Data Scientist? May be… Nothing is impossible. But if that’s what is your passion you wouldn’t be an application developer for the last 10 years. Right?

Is there anything else one can learn and contribute in the data world? Thought of sharing couple of valuable links which can give you a very good idea on the various aspects and where one can fit in.

#1 Will Balkanization of Data Science led to one Empire or many Republics? Via http://www.kdnuggets.com/2015/11/balkanization-data-science.html
#2 Becoming a Data Scientist via http://nirvacana.com/thoughts/becoming-a-data-scientist/
#3 Difference between Data Engineering and Data Science via http://www.galvanize.com/blog/difference-between-data-engineering-and-data-science/
#4 The world of data science: Who does what in the data world? Via http://cloudtweaks.com/2015/11/booming-world-data-science/matrix-1013612_640

Data is one of the hottest stack right now and it is growing at a crazy speed. It would be extremely difficult for any single individual to cope up with this change unless one’s basics are right.

Once you have the basics right, it is about Meta learning and evolving from there.

Working with various large scale data related projects for the last 15 months, following is my high level list of items one need to know to have a reasonable understanding of data (Big/Small). This list is no specific order. 😦

General A Basic overview of what is Descriptive, Diagnostic, Prescriptive, Predictive and Cognitive Analytics? Understanding of the concepts and difference
Data Warehouses
  • OLAP VS OLTP
  • Dimensional Modelling (Star Schemas, Snowflake Schemas)
  • Difference between Multi-Dimensional, Relational, Hybrid
  • In-Memory OLAP
No SQL Databases
  • CAP Theorem
  • If you are from application development, this is where the most important change would be. So far, you would have dealt primarily with Key-Value stores and Document Stores. For Analytics purpose (Write Efficient), it is important to start understanding column databases (E.g.: Cassandra) and Graph (E.g.:Neo4J). This is again a big shift from what you would have done as an application developer. Spend some time on it.
  • In-Memory databases in general.
  • Apart from Cassandra and Neo4J, get an understanding of what MemSQL offers. Yes, it is MemSQL and not MySQL J seems very impressive.
Outside EDWs
  • MPPs/PDWs – Difference between traditional EDWs and MPPs?
  • DWH on cloud AWS Redshift, Azure SQL Data Warehouse
Data Mining
  • What does it mean?
  • Data Mining Algorithms
Hadoop
  • Hadoop and Various Hadoop Components
  • When to use Hadoop?
  • Parallelization and Map Reduce Fundamentals
Outside Hadoop
  • Difference between Hadoop, Spark and Storm (I personally prefer SPARK. RDDs give me the same comfort what I had with ADO.NET)
  • When to use Hadoop/Spark/Storm over MPP?
ETL
  • Data Munging/Wrangling
  • Scrubbing
  • Transforming
  • Reading and Loading Data
  • Exception Handling
  • Jobs/Tasks
Real time Analytics Working with Stream: Real time Analytics is something everyone talks about. But without understanding what it means by Stream processing you will never be able to figure out this.
From an application background

  • Reactive Architecture (Responsive, Resilient, Elastic and Message driven)
  • Understand the difference between an Event and a Transaction.
  • Event Processing(CQRS, Actor Model[Akka], Complex Event Processing)

If you don’t understand the above, then it would be difficult to move forward. Spend time on these before moving forward to other items
Messaging/Data bus

  • Kafka

Processing Streams

  • Spark/Storm

Lambda Architecture

Machine Learning Machine Learning

  • Difference between Data Mining and Machine Learning
  • ML Algorithms

Couple of very good posts to read in this
Machine Learning for Programmers: Leap from developer to machine learning practitioner via http://machinelearningmastery.com/machine-learning-for-programmers/
What Every Manager Should Know About Machine Learning via https://hbr.org/2015/07/what-every-manager-should-know-about-machine-learning
Most of what we are doing can be achieved at some level using Excel Analytics Data Pack. In fact, I would say Excel is the most powerful tool out there.

Recommendation Engines
  • Collaborative Filtering
  • Content-based Filtering
  • Hybrid

Once you are clear with the concepts start implementing using Apache Mahout

Communication Protocols
  • JSON, AVRO, Protocol Buffer, and Thrift: If you are from application development – you would have used JSON extensively. It is time to understand the other ones as well. I keep arguing this with my friend Sendhil (IMO, AVRO seems to be the way to go – where things are evolving and need for self-documentation – Cowboys Friendly).
Time Series
  • Modelling
  • Databases (OpenTSDB)
  • Forecasting
  • Trend Analysis
Modern day HOLAP Engines
  • Apache Kylin (My favourite at this point)
Data Visualization Self-Service is the Mantra here. Read this article: Data Scientists Should be Good Storytellers

Most of the people in an organization cannot understand the outcome of analytics, however they do need the proof of analysis and data. Data storytellers incorporate data and analytics in a compelling way as their stories involve real people and organizations” via https://dzone.com/articles/data-scientists-should-be-good-storytellers

  • How to represent data (Graphs/Charts)?
  • Excel Power Pivot/ Power BI (Polybase)
  • Lumira
  • D3.js
Deep Learning Though it may or may not be important at this point, try to understand what is deep learning. Read this : Deep Learning in a Nutshell: Core Concepts via http://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/
Data Lake One of my favorite topic and something I learnt after burning my hands is with data lake

  • Understand what Data Lakes mean? Why do you need one? How to build a data lake on your own?
  • Extract Load and Transform (ELT)
  • ELT vs ETL

Read this: https://azure.microsoft.com/en-in/solutions/data-lake/

Language Though there is a bunch of things to do with Python, R, Java etc. My choice is Scala (I love the way the language allows you to express. Wish someone can afford me as a developer again J)

If you have a good grasp on above, then it is time for you to figure our when to use what (Creating Solutions).

 “If all you have is a hammer, everything looks like a nail”

Read this:  The Ethics of Wielding an Analytical Hammer via http://sloanreview.mit.edu/article/the-ethics-of-wielding-an-analytical-hammer/

Data is having an impact on business models and profitability. It’s hard to find a non-trivial application that doesn’t use data in a significant manner ~ Ben Lorica, O’Reilly Media

Ok, this looks like a large list. Where do I start?

  1. Focus on the basics. Get a good overview of the ecosystem
  2. Decide your area of specialization.
  3. Focus on your specialization and build skills.
  4. Iterate and change course as required.
  • If you are more than 10 years of experience, understand the business situation and figure out when to use what. May be pick 1 or 2 items and start implementing in your environment.
  • If you are less than 10 years of experience, pick up a scenario and try to implement this and see if it makes any business sense.

What I have not covered in the list? I haven’t gone into the details of

  1. Hadoop Ecosystem and components (Pig/Hive etc.)
  2. Algorithms
    1. Nearest Neighbour
    2. K-Means Clustering
    3. Linear Regression
    4. Decision Trees etc.
  3. R in detail
  4. Infrastructure
    1. Env Setup
    2. Zookeeper, Yarn, Mesos
    3. Replication
  5. Vertical Industry Solutions
  6. Operational Systems (like Splunk)
  7. Data Governance

I keep hearing/seeing people who have never seen more than 1 GB of data saying that they do Big Data Analytics. Don’t learn or do something for the sake of doing it.

There is no short cut to a place worth going.

My favorite books on this topic.

If you want to know more about what I am learning, you can follow me in Twitter

Happy Learning!

Microservices : Reading List

Modern day businesses requires agility to survive and to be a leader. If you translate this business requirement into technology requirement, this means X Deploys a day (Time to market).

The big bloated, complex applications that we have built over a period of time is not allowing us to meet this X Deploys a day without compromising quality. If there is a way to decompose the big bloated monolith application blocks into smaller chunks it will help the business to extend, manage and deploy and eventually the X Deploys a day could become a reality.

How do we get there? Is there a way to achieve this? Microservices (lots of small applications) is one of the ways that could help in achieving this.

Microservices means developing a single, small, meaningful functional feature as single service, each service has its own process and communicate with lightweight mechanism, deployed in single or multiple servers.
Source

Additional Reading List
The Twelve-Factor App
http://12factor.net/

Microservices Reading List
http://www.mattstine.com/microservices

Understanding Microservices
http://kpbird.com/2014/11/Monolithic-vs-MicroService-Architecture/
http://shakayumi.tumblr.com/post/95688359079/whats-the-big-idea-with-microservices
http://kpbird.com/2014/06/Microservice-Architecture-A-Quick-Guide/
http://www.infoq.com/articles/microservices-intro
http://www.slideshare.net/mstine/microservices-cf-summit
http://java.dzone.com/articles/microservice-architecture
http://tech.gilt.com/post/35711763311/how-gilt-com-give-came-to-be

Microservices Architecture and Scalability
http://www.pst.ifi.lmu.de/Lehre/wise-14-15/mse/microservice-architectures.pdf
http://technologyconversations.com/2015/01/26/microservices-development-with-scala-spray-mongodb-docker-and-ansible/

Microservices Patterns
http://blog.arungupta.me/microservice-design-patterns/
http://microservices.io/patterns/index.html

Simon Brown’s Video : Software Architecture & Balance with Agility
https://vimeo.com/user22258446/review/79382531/91467930a4

Books
Building Microservices
Software Architecture for Developers

Frameworks
http://gilliam.github.io/concepts.html
http://projects.spring.io/spring-boot/
http://fabric8.io/
http://azure.microsoft.com/en-us/campaigns/service-fabric/

Technology Ecosystem for the Modern Day Business Application Developer

Technology is changing at a rapid pace. Everyday you see something new to be learnt, which did not exist few months back. If you are like me, who has come from an application development background, what does this change means to you?

For sure, this is not for gyan. Tried depicting this in a form, which i could use as a reference.  I purposefully, hace not included Desktop applications in this. If you are working in some of them, you may have include it for yourself. Obviously, this may change when we revisit this in couple of months.

Similarly, things like Programming Languages (Java, C#, Ruby), OOPS Concepts, TDD, SOLID Principles are foundations.

Technology Ecosystem

Is this Perfect? Not Necessarily. This is my version and you may have a different way of visualizing this. If you create one, please do share it with me 🙂

Did I cover all aspects? Not really. Take Analytics as an example. If you take Descriptive Analytics, you start looking at traditional Business Intelligence, Data Warehousing, Data Visualization etc. Each one is a separate block diagram on its own. Hence, i have stopped it at a very high level for this.

Can i be a master of all this? May not be possible. But if we have to call ourselves as techies, then we at least need to know what these are, where we can use them and may be pick and choose couple of items that could be of our interest and master it.

Happy Learning!!!

How long did it take to build ….?

I had a wonderful opportunity to listen to and Interact with Hubert Smits couple of weeks back. One of the questions he asked during the interaction is “how long do you think it took to build empire state building (the tallest one when it was built in 1930s)”? There were answers like 10 years, 7 years, 20 years etc.

Empire State
He replied saying 400 odd days with 3400 workers… Immediately the follow up question was to complete the architecture and design? He said “No…” to complete the entire building.

I couldnt believe it immediately. Went and searched in Google and found links confirming the same.

http://history1900s.about.com/od/1930s/a/empirefacts.htm
http://answers.yahoo.com/question/index?qid=20070510175905AAyRbD9
http://en.wikipedia.org/wiki/Empire_State_Building

Some of the things which i could take as a lesson from this
1. It requires meticulous planning. I am not talking about creating a plan here… Continuous Planning.

2. The architect had previous experience in building something of this sort. But again, every project is unique in nature.. The risk management capabilities of this construction was beyond what we can imagine(identify, monitor, mitigate and track).
Women

3. The architects produced the initial design in 2 weeks, based on their previous experience. But, they would have refined it continuously as they built every floor.

4. Though the architects had previous experience, they would have ended up in new challenges every time. The design and development would have gone through iterations to address these challenges and not based on a big design upfront.

What this confirms is that even an industry which is considered to be the oldest, builds things iteratively and not based on one single plan. Hmm… now we are almost near end of 2013 and we work in a modern industry. But these practices are still @ a level where people only talk about it in most of the places.

Now… i will come back to my favorite place. May be after 50 years, someone might be writing about Belandur flyover. How long do you think it has taken to complete the incomplete Belandur flyover?

Happy Learning!!!

Image courtesy of

vitasamb2001 / FreeDigitalPhotos.net
stockimages/ FreeDigitalPhotos.net