
Cerrado
Publicado
Pagado a la entrega
We are looking for a Python developer with experience to build a robust, local pipeline that processes Binance Futures historical data into an ML-ready dataset. The goal is to ingest public data from Binance Vision (aggTrades, all klines, and bookDepth) and output clean, normalized, lookahead-bias-free features stored in Parquet format or DuckDB. Scope of Work & Deliverables 1. Ingestion & Database Setup (Core Foundation) Data Source: Programmatic downloading of historical daily/monthly ZIP files from public [login to view URL] (specifically aggTrades, all klines [1m], and bookDepth for BTCUSDT, ETHUSDT, SOLUSDT, XRPUSDT, BNBUSDT). Storage Architecture: Set up a local storage solution using DuckDB or Parquet to handle millions of rows without memory issues. Alignment: Parse and align different frequencies (tick-by-tick trades, order book snapshots, and 1m klines) to a unified timestamp sequence. 2. Core Microstructure Feature Extraction Implement Python/Polars (or Pandas) scripts to compute the features on the aligned data. 3. Advanced Optimization & ML Readiness Strict Lookahead Bias Prevention: Ensure all rolling features (e.g., rolling z-scores, Parkinson volatility) are calculated using t−1 parameters to prevent data leakage. Normalization: Implement rolling z-scores or min-max normalization per symbol to keep features stationary. Labeling: Implement a basic Triple Barrier Method or directional label generator. Output: Save clean Parquet files per symbol, free of NaNs and infinite values, structured for immediate model training.
ID del proyecto: 40488324
3 propuestas
Proyecto remoto
Activo hace 7 días
Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
3 freelancers están ofertando un promedio de ₹758 INR por este trabajo

Hi, Your description mentions "robust" and "local pipeline" — two requirements that usually get sacrificed first when developers race to meet a tight budget, so I want to address both directly. For a local Binance data pipeline, I'd structure it around the `python-binance` SDK with async WebSocket streams for real-time data and REST polling as fallback, writing to a local SQLite or DuckDB store depending on your query patterns. The key to actual robustness is a dead-letter queue for failed fetches and automatic reconnect logic with exponential backoff — without those, any network hiccup leaves silent gaps in your data. Before I quote a final number, I'd like to clarify two things in the next 24 hours: what data (spot, futures, order book depth?) and at what frequency, and whether "local" means a single machine or needs to survive reboots with systemd/launchd. Those answers determine whether $600 covers the full scope or just an MVP. Can you share the complete spec? Best regards, Val
₹600 INR en 7 días
2,3
2,3

The challenge lies in efficiently processing and aligning disparate frequencies of Binance Futures data while ensuring scalability and integrity for machine learning readiness. A robust solution requires programmatic ingestion of historical ZIP files, employing DuckDB or Parquet to manage large datasets without memory pitfalls. My approach incorporates Python/Polars for feature extraction, implementing strict lookahead bias prevention and advanced normalization techniques. The initial deliverable can be ready in 14 days. What’s your deadline,when do you need this live?
₹925 INR en 7 días
0,0
0,0

Lucknow, India
Miembro desde sept 25, 2021
₹1500-12500 INR
₹1500-12500 INR
₹1500-12500 INR
₹600-1500 INR
₹600-1500 INR
₹12500-37500 INR
$10-30 USD
₹750-1250 INR /hora
₹12500-37500 INR
₹400-750 INR /hora
$30-250 USD
$10-30 USD
$1500-3000 CAD
₹600-1500 INR
₹10000-30000 INR
₹750-1250 INR /hora
$1500-3000 USD
€2-6 EUR /hora
₹1500-12500 INR
$30-250 USD
₹750-1250 INR /hora
$250-750 AUD
₹12500-37500 INR
₹12500-37500 INR
₹100-400 INR /hora