I 🔨 things. Interested in plain text productivity (ObsidianMD) & engineering. If you want to talk about this mail me.

Home

how to create a data flow diagram (to threat model)

This article describes how you create a dataflow diagram. A dataflow diagram describes a system and how data flows between processes, external entity and where the systems stores the data. It is often used for threat modeling. As part of threat modelling you also document trust bonderies.1 You can either model it grapically2 or with text.3 I prefer text as text is simpler to maintain, especially large data flow diagrams.

Prerequesits

Understand the threat modeling fundermentals. Install python 3 e.g. brew install python. Create a repository mkdir datflow-diagrams. Create python file e.g. sample-df.py. Install the dataflow library dependency into the project pip3 install pytm.

Write / Code the Diagram

You find other examples in the pytm repo.3 First write down the general setup, start by importing the relevant objects e.g. actors, boundery, dataflow, datastore into sample-df.py .

#!/usr/bin/env python3

# import the object you require. The generic TM object, an boundery e.g. VPC, Dataflow, Datastore, Server, ExternalEntities, Actors 
from pytm import (
	TM,
	Actor,
	Boundary,
	Classification,
	Data,
	Dataflow,
	Datastore,
	Lambda,
	Server,
)

Next initalize the threat model and write the process backwards from the external customer. Write down the external entities & actors, then write down your system/processes (the server & data stores). Then add the data & dataflows. Finally identify to what bondaries the different objects belong.


# initalize the threat model
# you find the attributes for tm in the wiki: https://github.com/izar/pytm/wiki/Object-Model-(v1.x)-(WIP)#tm
tm = TM("sample system integration")
tm.description = "This discribes the dataflow of a web component intengration of the tool on the XY cloud"
tm.isOrdered = True
tm.mergeResponses = True
tm.assumptions = [
"The customer uses our building block.",
"We host the tool in our cloud and the customer uses our tool there."
]

# Boundary
internet = Boundary("Internet")
vpc = Boundary("AWS VPC")
webapp = Boundary("web application", inBoundary=internet)

# externals

external_backend = ExternalEntity("external backend", inBoundary=internet)
drug_api = ExternalEntity("drug provider", inBoundary=internet)

# api service
api_service = Server("underwriting service", inBoundary=vpc)
drug_service = Server("drug service", inBoundary=vpc)
uw_config_db = Datastore("underwriting config db", inBoundary=vpc, )
drug_service_cache = Datastore("drug service cache", inBoundary=vpc)

# IDP
idp = Server("identity provider", inBoundary=vpc)

# authenticate in the bff to proxy all requests
Dataflow(bff, idp, "authenticate with clientId/clientSecret")
Dataflow(idp, bff, "return access token", isResponse=True)

# do the underwriting
pass_input_data=Dataflow(app, bb, "pass selection on previous screens (e.g. product, general info) and assess_token for the BFF")
get_diganoses = Dataflow(bb, bff, "get diagnoses")
proxy_get_diagnoses = Dataflow(bff, uw_service, "forward request & add our credentials")
diagnosis_to_bff = Dataflow(uw_service, bff, "return diagnoses", responseTo=proxy_get_diagnoses)
Dataflow(bff, bb, "return diagnoses", responseTo=get_diganoses)
search_med_bff = Dataflow(bb, bff, "search medications")
proxy_search_med = Dataflow(bff, drug_service, "forward search medications and add our credentials")
check_cache = Dataflow(drug_service, drug_service_cache, "check for result")
Dataflow(drug_service_cache, drug_service, "return result", responseTo=check_cache)
med_ds_db = Dataflow(drug_service, drug_bank, "search for medication")
med_db_ds = Dataflow(drug_bank, drug_service, "return result", isResponse=True)
Dataflow(drug_service, bff, "return result", responseTo=proxy_search_med)
Dataflow(bff, bb, "return result", responseTo=search_med_bff)
get_dyanmic_questions = Dataflow(bb, bff, "get dynamic questions")
proxy_get_dynamic_questions = Dataflow(bff, uw_service, "forward request & add our credentials")
Dataflow(uw_service, bff, "return dynamic questions", responseTo=proxy_get_dynamic_questions)
Dataflow(bff, bb, "return dynamic questions", responseTo=get_dyanmic_questions)
trigger_risk_assessment = Dataflow(bb, bff, "calculate risk")
proxy_trigger_risk_assessment = Dataflow(bff, uw_service, "proxy request & add our credentials")
lookup_risk_assessment = Dataflow(uw_service, uw_config_db, "look up risk")
Dataflow(uw_config_db, uw_service, "return risk", responseTo=lookup_risk_assessment)
Dataflow(uw_service, bff, "return risk", responseTo=proxy_trigger_risk_assessment)
save_result_to_abs = Dataflow(bff, external_backend, "save result to backend")
Dataflow(external_backend, bff, "confirmation", responseTo=save_result_to_abs)
Dataflow(bff, bb, "return risk", responseTo=trigger_risk_assessment)
Dataflow(bb, app, "return uw result", responseTo=pass_input_data)

#load_the_web_compoent = Dataflow(app, bff, "load the web component")
#proxy_webcomponet = Dataflow(bff, uw_service, "get web component")
#Dataflow(uw_service, bff, "return web component", response=proxy_webcomponet)
#Dataflow(bff, app, "return web component", responseTo=load_the_web_compoent)



if __name__ == "__main__":
    tm.process()

Generate the Digram

Run the python code to generate the diagram

python3 ./sample-df.py --dfd | dot -Tpng -o frontend.png

Footnotes & Resources