Tuesday, March 7, 2017

Tales from the Serverless landscape

Having worked with the AWS ecosystem for 9 yrs, I have witnessed many interesting leaps of advances in cloud computing. Be it the initial days of load balancers and clusters or the refreshing ideas of docker containers or the highly coarse grained application services like Elastic Transcoding Service (ETS). But never have I felt more amazed and excited than with AWS Lambda and the now growing ecosystem of serverless computing and function as a service (FaaS). I think it is worth while sharing the story of how I discovered it and got blessed.

It was early 2015 when we just started exploring ETS for a product and its associated service. The system was a custom 4K camera which would capture and save 4K video in real time but upload to cloud (read S3 ) in non-realtime due to its sheer size. We had to transcode the video in cloud to a form that users could view and interact with. I was really excited to use ETS for the transcoding aspect of the system as it replaced a huge blob of custom development and associated operational issues of managing servers in cloud and scaling them on need basis.

However very soon I was irked with the fact that we needed a way to trigger ETS jobs after our 4K videos were uploaded to S3 and we needed a way to do some record keeping of what is done/in progress etc. The very first idea that came to the table was: Let’s run a tomcat server in EC2 to do the record keeping on mysql and trigger the ETS jobs. The server would expose a REST API which our uploader application running on the custom camera would call. We were already running other tomcat servers for similar record keeping needs.

Yikes, we were back to the square one… the same need to maintain scalable infrastructure in cloud. What if that tomcat server went down? In terms of functionality the tomcat server represented an iota but it would be a critical link in the chain of reliable infrastructure. This is the type of nodes which can be done with micro instances but can easily become a bottleneck during application usage burst and they are idle for most time.

Wishfully thinking I said: “It would have been nice if the ETS jobs could be started all by themselves in response to “upload completed” event in an S3 bucket!”. And with this “aspiration” we started digging around and we found we could trigger lambda functions when a upload job to S3 completed and we could run some custom code in node.js to trigger our ETS Jobs. OK and what do we do when the ETS job is done? well ETS posts to a SNS topics both on completion and on error and guess what? that can trigger Lambda functions too. These two finds got us thinking, Hmm… that is so nice we won't have to bother running that server anymore. We don’t care about the load or multiplicity either as it is a service after all: AWS has to deliver it whenever we ask for it: INFINITE scale!. I am sure the skeptics are saying ok but at what price ?

Do make it a point to read the pricing at the link above. IT IS REAL!. Think when was the last time you saw prices in microcents. It costs literally nothing to run these functions. They give 1 million requests and 400,000 GB-seconds free. That is 3.2 million seconds of a 128 MB node.js virtual machine. So not only you get a infinite scale, you get it for dirt cheap rates.

No need to watch any server cluster in cloud and yet have a fully working highly scalable system. Sleep peacefully at night!

Over the next 18 months we changed our architectural thinking to get rid of any ec2 nodes whose job was record keeping or coordinating between various AWS services. Instead we started architect-ing with the following stack:

  1. API Gateway
  2. Lambda Functions
  3. DynamoDB/SimpleDB and/or S3

Clients call API gateway which triggers Lambda functions which manipulate data in DynamoDB/SimpleDB or S3.

This is the infrastructure stack which

  1. Is not running 24x7 hence does not fail due to disk outages or such mishaps.
  2. Has no cost at rest ! well other than the data storage itself
  3. Scales to any load demand.

In the next tale I’ll cover how we converted mission critical bossy massive GPU nodes into servants of meek Lambda functions.

Wednesday, September 9, 2015

Node.js, Promise and DelayedResponse

Last couple of days I have been trying to put together a sample which shows how to use node.js to render a webpage in action. The idea is that the node server runs a long sequence of tasks on users behalf and I wanted to show the progress of these tasks in real time to the user on the web page. I was also trying to learn the tricks of Asynchronous Programming.

I learned multiple interesting things and hence this post. The trimmed down sample is here: https://github.com/kumaakh/Samples/blob/master/Sample1

app.js is the main script which sows the flow against a simple GET /test request.

It can be run as follows:
"pre" "f" and "post" functions are the supposedly long running tasks which output their logs to the web page directly. here they dont take long but in real code they can each take seconds 

DelayedResponse helps in holding the HTTPResponse till the last function is called
The callbacks are implemented as "promises" and are chained together using "pipeline". 

The promise approach makes it easy handle a long sequence of callbacks else the code stays very cluttered and difficult to read. 

On the way I also tried Nodeclipse which made it easy to do many things with node.js however I found step debugging to be a pain. it worked much better with direct command line like "node debug app.js". Nodeclipse could never attach to sources from node_modules.

Friday, July 16, 2010

On Software Architecture and Agile

Recently, some one asked me a question which had both Agile and Software Architecture in the same sentence. Here is what I feel/replied. Looking for comments from readers

A. Distinguish Architecture from Software Design-Build-Test-Deploy cycles.
-We need to understand that agile methods apply to the design-build-test-deploy cycles much more than they do to define grounds up architectures. In other words when new projects are proposed/started there may/may not be clear guidance/agreement on the development model, worst who/when/where the development takes place is not even known sometimes. 
-An early definition of an architecture is needed to create early sketches of what is required, here one looks at the customer needs/system context to come up with a architectural overview. Further refinements/elaborations on this, lead to definitions of "components" some of which may be software components (other examples being hardware/networking components). The software components may further be evaluated for build vs buy merits. e.g a database/OS will always be bought while customer specific business logic may at times may be available in a packaged software or be custom built. Now these early architectural decisions lead to a definition of what needs to be built, as you can see some of these decisions would drive how components communicate between them selves. Though there may be a choice available at times. (e.g. a java program uses jdbc or client-api calls).
-Notice nothing has been detailed on the knowledge/information/software content of this project while there is an architecture defined as an early guidance.

B. Refactoring keeps evolving the software design and improves its "-bilities"
From here on things work dramatically different for an agile software development team vs a water-fall/UP team. 
An agile software development team starts building the software by looking at the user-stated-needs and within the architectural sketch (which only bounds them in choice of platform/language/package etc.). They do not spend time on elaborated requirements gathering/documentation/sign offs or "Designing" the whole system. Instead they focus on user needs and turn them into testable software in small cycles. Further with-in the cycle they create small engineering tasks. Every task in implemented by doing "simplest design which satisfies the cycles goal". As they move along every cycle they need to keep refactoring their programs for various reasons: e.g. a) in pursuit of DRY (do not repeat your self) principal  b) keep the code free of code smells. c) many a times a new task can not be completed without refactoring the existing code (e.g. changing a 1-to-1 relation to 1-to-N)
I look at refactoring as a task of organizing your house before you welcome new members (features) so that the house does not become a mess when they are in.

Now, clearly this leads to an "evolutionary" design of software. Refactroing, at times, also leads to creation of new metaphors/architectural templates in the system, which make it easy to add new features (no extra emphasis is given for future features). However, there are no hard lines, a future cycle may deem it appropriate to refactor them to meet that cycles goal.

C. In-flight architectural evolution is possible with agile.
From the text so far it may seem that an architecture defined up front is sacrosanct and should not change. Not true for agile teams, they "embrace change" which could as well be a change in architecture. An example: " we found abc database to be too slow for the current needs of this application and we need to move to xyz". it is an architectural change, and an agile team reacts to it in the same manner as it would to any change: a) make sure you have unit tests for the impacted area b) refactor the code c) implement the tests for the new feature d) implement the feature. 
in the particular example, refactoring the code may mean bringing in a database-adaptation-layer to make sure that the code can sustain the change in database. However the basic approach remains same. unit-tests help you survive the refactoring, refactoring makes way for new features. So the agile practices (refactoring,  test-first to name a few) do help in architectural Spikes. They also help in evaluating new platforms/frameworks/thirdparty softwares quickly. e.g. the Unit tests come handy when we want to quickly check if we all our features would work in a different operating environment (windows vs linux, oracle vs db2 to name a few)

On the lighter side: If I focus on the two words individually, I see an oxymoron, a paradox : Software Architecture !

Wednesday, July 22, 2009

Cruising in control

In last two months, I have made another leap towards Agile practices in our development environment for a non-java suite of projects. We have started using Cruise Control to do continuous integrations. So we have build robots which build, test and publish our projects.
Cruise Control is a beauty to work with. Clearly the authors didn't have non-Java environments as their focus, but at the same time it wasn't too difficult to adopt it to our need... without making source code changes to CC itself.

The development environment for these projects is MS Visual Studio, and we use TUT for unit testing. I was curious to try CC and figure out how much do I need to adopt it. I knew that CC assumes java+jUnit+ant to be the development environment. So I started with a simple first step of defining ant targets which can do basic steps:
a) checkout b) build c) test d) publish
This took me ~2 days for 7 projects and we were on with CI. Off course 1.5 days for the first project and half a day for replicating it to other projects. I had to scan CC forums to find how other folks parse the build output from Visual Studio to report errors/warnings. (I'll followup this entry with credits to the community source I used and links to other useful information.)

It was awesome to see the builder notify our team of failed/passed builds. This initial implementation could take care of project interdependency(s), it could report build (compile, link) errors. It wasn't too good for reporting test results: it would inform if unit tests passed or failed as a whole (not individual cases) since default output from TUT is way different from jUnit.

Once this was running for a month (~200 builds), I spent another day in hacking a new test reporter for TUT which could generate test results in xml. I also had to tweak the xsl which chews the test results and wow, now we are able to see failed tests in our build mails.
We could also see a list of successful tests and their timing in CC console.

Cruise Control Rocks!!

Friday, September 12, 2008

Agile all the way

I have been applying agile techniques in various shapes and degrees to a variety of environments for a number of years and I have not been able to find a context where one could not benefit from these techniques. I have found that most of the time its the resistance to change and the fear of failure which makes the team stick to old: "we know our old text-editor way works and it is too late to learn new things".

I have been moving between quiet diverse programming environments, some of which are generally believed to be very different from each other:
  • Server side C++ to fat GUI clients
  • client server to three tier
  • J2EE to MS .net
  • Embedded java (J2ME) to DSP assembly
  • C/C++ to Java and back
If you are a project manager and you hear your team say any of the following: you should know that you need to seriously consider Agile techniques:
  • IDE is too heavy, anyways we are using textpad plus plus plus
  • The only way to test this beast is by deploying it there.
  • Lets add some print messages to see how far it runs before a crash
  • We have divided the functionality amongst the team and we'll integrate by end of this week.
  • We have been merging code since last two days.
  • Deployment environment will be ready 3 months later, 1 month before acceptance testing.
Every time I hear the above, I know the answer lies in Agile techniques, however on most occasions the non believers say that agile techniques (TDD, continuous integration, refactoring, pair programming) are ideas for a different world, they can not be applied here. I have found that all it needs is a willing soul: a coach/mentor/volunteer/leader who dares to make the road. Offcourse that "ONE" also has to decide which techniques make maximum sense for a given project context.
The most interesting and controversial of these I think is the case of "Embedded Agile" which I'll cover in another post.