MovieBuzz System Design: Coding End To End System From Scratch!
Problem Statement: Design a horizontally scalable and highly concurrent Movie Ticket Booking Platform with the following features.
- Show active movies running in theatres in the user’s city.
- Provide users an option to add Reviews and Ratings for each movie.
- Show the Average Rating and Reviews for each movie.
- Show the nearest theaters from the user’s location for selected movie booking for the user’s city.
Update: If you like this blog then have a look at Part 2 of this same blog with some more features: https://bhushan-gosavi.medium.com/moviebuzz-system-design-coding-end-to-end-system-from-scratch-part-2-576e96e74440
Requirements
- Around 10 Million Users
- Around 5 Million Movie Details
- Around 1k Bookings/sec
- Around 10k Events/sec (Rating, Reviews)
Platform Features
- Horizontally scalable
- Highly Concurrent
- Microservice Architecture
- Containerized Application
- Production Grade Code with Integration Tests using Docker Containers
Database Choice
We want to store 10 million user details and 5 million movie details. We are looking for a highly available database. We can compromise with the Consistency of user details and movie details. The best choice for storing this big data is Cassandra.
When the user opens the application we want to show a list of running movies in the user’s city. Once a user selects a movie, we want to show the user a list of the nearest Theatres in the user’s city running the shows for given movie. We can’t achieve this using Cassandra. We need a search engine for that. We can use ElasticSearch to solve these use-cases.
Cassandra is designed for heavy writes. As any write operation is just adding data to Memtable in RAM and appending data to commit log in the target node. So we can store all the movies and theatres details in Cassandra. The read operation in Cassandra is performance intensive. As reads have to go through n number of SSTables on disk via multiple caches in memory and disk. So we should try to avoid reading from Cassandra as much as we can.
Whereas for ElasticSearch, write operation is costly as every time we insert a document in ElasticSearch, we are indexing that document. So we should only store Movies and Theatres fields on which we want searchability. We are not allowing users to search movies by Actor names, so we should not store Actors associated with movies in ElasticSearch.
We can’t compromise on consistency in case of which seats are available and we don’t want multiple users to book the same seats in the same theatre. So we need a highly consistent relational database even if at the cost of availability. For this use-case, we can go with the sharded SQL database.
Technologies
- Cassandra
- ElasticSearch
- Docker
- Apache Kafka
- Spring Cloud
- Hashicorp Consul
- Zookeeper
The Architecture
- As this is a Containerized microservice architecture, this architecture can be deployed on the Kubernetes cluster easily. Leveraging the Kubernetes cluster moviebuzz-services can be scaled up or scaled down based on the incoming traffic automatically.
- MovieBuzz Gateway: Containerized Application for Incoming user API requests Authentication and Routing
- Load Balancer Service / Ingress Service: Kubernetes service through which users can access moviebuzz-APIs.
- MovieBuzz API: All the user-facing and back-office APIs will be implemented here.
- Apache Kafka: Once a user adds any movie review, it can be added to a moviebuzz-user-reviews topic. This topic message can be processed by multiple processors. eg. One processor can update the average rating after users add a review. Another processor can leverage this Kafka topic to apply Movie Recommendation Model to user reviews. Other Kafka use-case can be after the booking is complete Booking details can be added to moviebuzz-booking-confirmed topic, this topic messages can be processed to send booking confirmation emails, messages to users.
- MovieBuzz Kafka Processor: This service is used to process kafka messages. Just increasing the processor replicas won’t increasing the kafka topic message processing parallelism. We can increase the processing parallelism by increasing the kafka partition count and increasing processor replicas or increasing topic consumer threads per replica.
- Hashicorp Consul: The microservice architecture is developed using Spring Cloud. Consul is used for storing distributed configuration for all the microservices running in a single place.
Database Schemas
Cassandra
We can create Moviebuzz keyspace in Cassandra multi-datacenter cluster using NetworkTopologyStrategy with at least 2 replicas in each datacenter.
- We can enable KEY caching for all the tables.
- We can enable ROW caching for movies table with rows_per_partition cache set to 1 (there can be at max 1 movie per partition), as Reads on movies table will be 90% more than writes.
- We can enable ROW caching for movie_reviews and movie_bookings table with rows_per_partition cache set to 10, because if user clicks on Bookings tab we will show him 10 recent Bookings only and if user clicks on a movie, we will show him 10 recent reviews only. (User can opt to see more bookings and reviews by clicking on more button)
- There will be heavy writes on movie_ratings table. So we will leverage counter columns for movie_ratings table and we will create in memory LoadingCache with expire_after_write=30mins for 5000 most frequenly accessed movie ratings.
1)moviebuzz.movies: Table to store movie details like description, actors, crews, release date, genres, etc. with movie UUID as the partition key. Movie UUID is generated from movie name and movie release date combined.
2)moviebuzz.theaters: Table to store Theater details like name, city, location, list of running movies etc with theatre UUID as the partition key. Theatre UUID is generated from theatre name and city name combined.
3)moviebuzz.users: Table to store user details with user UUID as the partition key. User UUID is generated from user email.
4)moviebuzz.user_bookings: Table to store user movie bookings history with unique bookingId per booking. User UUID is used as the partition key and bookingId is used as clustering column.
5)moviebuzz.movie_ratings: This table is used to fetch the average rating for each movie. This table used two counter columns, one counter column is used to store the count of users rated the movie and the other counter column is used to store the total rating for the movie.
6)moviebuzz.movie_reviews: Table to store movie reviews added by all users for a given movie.
ElasticSearch
- moviebuzz_movies index: When a user opens the application we want to show the user list of all the running movies in his city. Also, we want to give users the ability to search movies by name. It can be achieved by querying the moviebuzz_movies index.
2. moviebuzz_theatres index: When the user clicks on a movie, we want to show users the list of nearest theatres running the selected movie show. This can be achieved by storing theatre's location as geo_point in the moviebuzz_theatres index and querying the index by movie name and user location.
The Code
Repository: https://github.com/bhushan-gosavi/moviebuzz-parent
Platform Features
- Horizontally scalable
- Highly Concurrent
- Microservice Architecture
- Containerized Application
- Production Grade Code with Integration Tests using Docker Containers
Build Steps
- map mkafka and mzookeeper hostnames to localhost using Hosts file
- Run maven build on moviebuzz-parent module to generate JAR files
mvn clean install
- Run maven build using ‘integrate’ profile to launch docker containers and run Integration Tests
mvn clean install -Pintegrate.
- After running maven build with Integrate profile, docker images will be created, all docker stack will be up on local docker host machine including Cassandra, ElasticSearch, Kafka, Consul, Processor and API. If all the integration tests pass, then moviebuzz-integration module test will be successful.
Pending Tasks:
- Integrating SQL database for real time Bookings
- SQL sharded database schema