๐งพ Blockchain ETL Pipeline
March 2025 ยท PostgreSQL ยท Bitquery API ยท Python
๐ Project Overview
This project automates the extraction, transformation, and loading (ETL) of blockchain transaction data using the Bitquery API. It focuses on crypto trade data and stores cleaned results into a PostgreSQL database for further analysis. The goal was to build a reusable data pipeline for historical and real-time blockchain analytics.
โ๏ธ Methodology
- Extraction: Queried the Bitquery GraphQL API using Python to fetch token trade data for selected time windows.
- Transformation: Normalised JSON responses using
pandas, parsed timestamps, selected relevant fields (e.g. token name, amount, value, sender/receiver), and handled missing/nulls.
- Loading: Connected to a local PostgreSQL instance using
SQLAlchemy and inserted transformed data into custom schema tables.
- Automation: Designed for both batch loading (historical backfill) and near real-time streaming for future cron jobs.
๐ก Features
- Supports flexible API date range queries and token filters
- Modular ETL functions for easy reuse and testing
- Connection pooling for efficient PostgreSQL insertions
- Simple CLI interface for custom token loading
๐ Errors & Fixes
- Rate-limiting on Bitquery API โ implemented exponential backoff and retry logic.
- PostgreSQL schema mismatches โ enforced data type casting and schema validation pre-load.
- Timestamp inconsistency โ used UTC conversions and ISO format validation for ingestion consistency.
๐ Key Takeaways
- APIs with rate limits require robust retry strategies for production pipelines.
- ETL structure benefits greatly from modularisation and logging at each stage.
- This pipeline can be expanded to support multiple blockchains and used for crypto dashboards or alerts.
๐ Repository
GitHub Repo: Blockchain ETL Pipeline
โ Back to Blog