๐ฅ Healthcare AIHW ETL & Visualisation
July 2025 ยท Web Scraping ยท Streamlit ยท Data Visualisation
๐ Project Overview
This project focused on automating the extraction, transformation, and visualisation of healthcare-related statistics published by the Australian Institute of Health and Welfare (AIHW). The goal was to enable quick, dynamic insights into key health metrics using an interactive Streamlit app.
โ๏ธ Methodology
- Web Scraping: Used
requests and BeautifulSoup to scrape Excel data from the AIHW site, filtering out outdated or broken links.
- Data Transformation: Parsed and standardised health indicators from Excel sheets using
pandas, handling non-tabular formats and inconsistent headers.
- Data Cleaning: Merged time-series datasets across various categories (hospital, mental health, chronic disease), cleaned nulls, and renamed confusing column labels.
- Visualisation: Created a Streamlit app with dynamic dropdowns, trend charts (line/bar), and data export capabilities. Implemented responsive layout with
st.columns().
๐ก Features
- Automatic scraping of latest health reports from AIHW
- Consolidation of multiple Excel sources into unified tables
- Interactive data exploration by year, topic, or region
- Built-in download option for cleaned data
๐ Errors & Fixes
- Encountered inconsistent Excel layouts โ fixed by detecting header rows dynamically with
openpyxl fallbacks.
- Streamlit caching bugs with file download โ resolved using
@st.cache_data and unique filenames for downloads.
- Broken URLs due to website update โ added regex validation and fallback logic to skip invalid links.
๐ Key Takeaways
- Automating data sourcing can drastically speed up reporting workflows.
- Handling real-world healthcare data requires flexible and robust preprocessing logic.
- Streamlit enables rapid visual insight development without frontend overhead.
๐ Repository
GitHub Repo: Healthcare AIHW ETL
โ Back to Blog