A Quick Demo of Apache Beam with Docker

Deploy Flink & Beam with Docker

git clone https://github.com/ecesena/docker-beam-flink.git
cd docker-beam-flink
docker-compose up -d
docker-compose scale taskmanager=2
docker psCONTAINER ID IMAGE      ... NAMES
3d59d952d152 beam-flink ... dockerbeamflink_taskmanager_2
4cce6219be80 beam-flink ... dockerbeamflink_taskmanager_1
3b7b6b32b4de beam-flink ... dockerbeamflink_jobmanager_1

Run HelloWo — ehm, WordCount

open http://$(docker-machine ip default):48080
  1. Click “Submit new Job” in the left menu — we'll see beam-starter-0.1.jar pre-uploaded
  2. Flag the checkbox near beam-starter-0.1.jar
  3. Click on “Submit” (or “Show Plan”). No additional parameter is needed.
docker exec -it dockerbeamflink_taskmanager_1 /bin/bashcat /tmp/output.txt*
...
live: 13
long: 15
look: 14
lord: 90
lose: 6
...

Build a Beam Pipeline

git clone https://github.com/ecesena/beam-starter
cd beam-starter
mvn clean package

--

--

--

Forging the Everdragons2 NFT. Former security at Pinterest.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Emanuele Cesena

Emanuele Cesena

Forging the Everdragons2 NFT. Former security at Pinterest.

More from Medium

Undatum: command-line JSON lines/BSON data processing tool

Developing flexible ETL pipelines between databases with Apache Airflow

What are Abstract Data Flows and why should you use them ? (Long form)

Data engineering in a nutshell : coding up business rules on the one hand, and creating infrastructure on the other

Introduction to Apache Airflow and its Components

Apache Airflow