My Data Driven Applications Stack

The migration from functional applications and architectures to data centric / flow / reactive architectures reminds me of the industrial revolution. Back then after the invention of electricity we just replaced stream with electricity and the result where underwhelming. This change after we redesign the manufacturing space according to the processes (which was no possible as energy could now be transported to the machines). In this article I try to clarify for myself how a architecture looks like when we focus fully on data (which is the key resource of today’s companies).

First an overview of all components with links to more details and in over time links to the sample application where I will build this in.

Work in Progress

Components:

Data Flow Oriented (Reactive) Interfaces / Sensors / Actors:

Types: Web / Mobile / IoT (Car / Wearables / Robots / Devices)
Technologies:
– Internal: React, Redux, RxJS
– Integration: Kafka Client, REST (best practices), JSON

Services / Cognitive Functions / Backends:

Technologies:
– Internal: RxJS/Java, NodeJS, Scala/Java, Akka, Play, Python, Spark, Tensorflow
– Integration: Kafka (Messages), DFS / Apache Nifi (Payload), Formats(JSON, Avro)

Build Processes

DevOps:
– Everything in containers (including DB, Analytics …)
Containers: Docker/Kubernetes/Helm
Continuous Integration: Jenkins, JUnit, A/B Testing, Code Coverage

Collaboration

  • internal/external open sourcing (owner/ pull request / reviews ! / forks  -> no central components)
  • requirement analysis / inter team / business / it collaboration (consumer driven contracts)
  • business UIs (pega needs that too … business people … new need a pega course …)

Security

  • security by design ( part of the automated dev pipeline (check for licenses, check for container vulnerabilities) -> warning noch deal breakers) (JWT …)

Architecture Guidelines:

  • Teams are free inside their application, but limited in the communication between applications
  • Services should be structured according to their bounded context (domain driven design)
  • Migrate the old world by isolating/proxy and abstracting the interfaces

Lessons Learn from 7 years of twitter: How to manage twitter for a better experience … #bettertwitter

  • Forget about follower numbers: Don’t follow anybody back you don’t want to collaborate with.
  • Use lists: There are interesting/informative accounts out there – don’t follow them put them on a themed list.
  • Follow: only people I want to work/collaborate with or they are from my regional ecosystem.

Why I am not following you? You are … #improvetwitter

  • Spammy: tweeting more then 5 times a day.
  • Trivial: I don’t care about the menu on your flight.
  • Selling: 80% of your stuff is just about how great you are.
  • Non interactive: you are not interested in conversation.
  • Non-doing: you talk to much but do to little.

Be more:

  • Interesting: Work on something interesting and talk about it.
  • Collaborative: On this with others (me?).
  • Fun: From time to time it is ok to be trivial and tweet about fun things.

I did not follow this myself but I will hold myself to this standard and hope I will contribute to a better experience for everyone. The first thing I did – deleted my automated Industry 4.0 summary on paper.li that spammed my timeline, defollowed accounts that where spamming me and now I’m hoping to having more meaningful conversations.

Feedback aways welcome!

Central Component for #datadrivenarchitecture: Real-time Insights powered by Reactive Programming #netflix

Can help with:

  • testing
  • debugging
  • security

Solution:

  • log everything

Problems:

  • so much data
  • so many devices
  • not feasible to save to elasticsearch first (real time!)

Stream analysis with reactive programming

In a data driven architecture the processors for the high performance message bus benefit from being written in Rx.

Who use es it?

users of rxjs: netflix, google, facebook and more

Existing Solutions:

Differences

  • Mantis is designed for operational use cases where message guarantee levels vary by jobs. So, some jobs can choose at-most once guarantees while others choose at least once guarantees via Kafka. We are able to saturate the NIC on the servers for operational use case with very little CPU usage.
  • Bulit-in back pressure that allows Mantis to seamlessly switch between push, pull or mixed modes based on the type of data sources
  • Support a mix of long running perpetual analysis jobs along with user triggered short lived queries in a common cluster
  • Since the volume of data to be processed at Netflix varies tremendously by time of day being able to autoscale workers in a job based on resource consumption & the ability to scale the cluster as a whole was a key requirement. None of the existing streaming frameworks provided such support.
  • We wanted more control over how we schedule the resources so we can do smarter allocations like bin packing etc. (that also allows us to scale the jobs)
  • deep integration with Netflix ecosystem that allows filtering event stream at the source of data.
    among others

Sources: