Curious Problem Solver | Data Analyst | Data Science Enthusiast | Python Lover
View My LinkedIn Profile
View My Curriculum Vitae
Project Description:
This project uses very inconsistent HTML data that is stored on multiple web pages that were archived to retrieve COVID 19 data across various measures on a country level.
Click here to View .py or .ipynb files relating to this project
This Script looks at the Links archived for this website which has a CSV of all the links to the snapshot. We extract from the web, filter out some records and the make a Dataframe
Run through the HTML to get the column names and column count to allow us to this in the next step. This is done to assist in the changing number of variables over time.
Run through the HTML per day, to extract the values and store in database
Use Seaborn to display the data.
**Packages used : Beautiful Soup | Pandas | Requests | SQL Lite | Seaborn | MatplotLib **