Re-designing the backend of an existing application
Redesigning the backed of a legacy Twitch application
During my time in the software engineering immersive with Hack Reactor, we were tasked with resdesigning and optimizing the backend of a legacy application created by another group of our fellow students.
Here I will be sharing my though process in the decisions I've made in each area of the redesign.
Choice of database for Twitch Service application
Why I chose to use an SQL database over a NoSQL database for this project
NoSQL databases are aimed towards analog data that is non-relational (like a newsfeed stream on a social media site).
SQL databases are designed for relational data.
SQL support table JOINs to retrieve related data from multiple tables in a single command.
Based on the legacy apps backend I was redesigning, it made more sense to have a database that supports table joins as the data is relational.
The data also does not change for my service, so maintaining the integrity with a SQL database factored into my decision. Postgres has built in caching as well, it keeps track of subsequent queries so that future queries have a faster response time.
Either database would have been fine to use but the SQL database seemed to be more efficient in a fully functional, scalable version of the application.
Generating 10m records to load into database
Initially, we tried to generate 10m records and load into our database without any usage of promises. The function would eventually error out because it would be trying to generate and insert large batches of data at the same time.
I refactored my function using the async and await, the function works asynchronously via the event loop and uses an implicit promise to return a result.
The await portion of the function pauses the async portion, waits for the promise resolution, then resumes the async portion and returns the resolved value.
I was able to seed 10m records this way, however, I later refactored my code to generate CSV files instead of inserting data directly into my database while the function ran. It was much faster to write the generated data into a CSV file then import the CSV file into my database after all the data had been generated
DBMS Server Deployment
Deploying my database in an ec2 instance proved to be quite the challenge mostly because I’ve had no prior experience or knowledge on how to do so.
After doing some research online, I found some resources and explanations for how to do so. Using AWS, I installed postgres on a linux based instance.
The next challenge was figuring out what to do with this instance. I learned to ssh tunnel into my instance from my local computer. After ssh’ing into the instance I was able to run sudo yum commands to install and initialize my postgres database.
After installing the database, I adjusted some configuration settings that enabled remote connections to the database and enabled connections from any ip address.
Back in aws I had to insure that ec2 enables remote connections to the posgresql port. I did this by customizing my security settings for the instance on aws.
Deploying the service and proxy
Deploying my service also proved to be a challenge, previously, deploying with elastic beanstalk made the entire process really smooth and easy.
Deploying the service in a ec2 instance required some extra work. My first challenge was keeping the connection to my app up without me manually having to go to the instance and turning on the server. This issue was solved by using a middleware called PM2.
I also installed nginx on my instance. Nginx works as a reverse proxy and also has a load balancing feature. Although, I did not fully learn how to use the load balancing feature of nginx I was able to use amazons load balancing service.
Load balancing and autoscaling
When scaling my instance with auto scaling, nginx’s reverse proxy allowed the new instances to be spun up on port 80 (nginx’s port) without conflict.
Configuring aws autoscaling features, launch configuration, load balancer and target groups allowed the load balancer to spread the work load between multiple instances, after the computer utilization % reaches a certain threshhold. New instances from an AMI created off of the original ec2 instance with the service contained snapshots of the code enabled scalability for the service.
With this set up, I was able to scale my service and reach 10k rps on both my service and service in the proxy. However, it took about 13-14 instances to handle that amount of stress.
After doing some more research, we found that Nginx also has a caching feature. By altering and creating config files within Nginx, i was able to cache and store data in a directory I created that would be valid for a short amount of time. Caching data saves the data temporarily on the instances disk memory. Caching data stores data so that future requests for that data can be served faster. The trade off for this however is that more disk memory is required for the cached data.