
Cerrado
Publicado
Pagado a la entrega
We are looking for a Python developer with experience to build a robust, local pipeline that processes Binance Futures historical data into an ML-ready dataset. The goal is to ingest public data from Binance Vision (aggTrades, all klines, and bookDepth) and output clean, normalized, lookahead-bias-free features stored in Parquet format or DuckDB. Scope of Work & Deliverables 1. Ingestion & Database Setup (Core Foundation) Data Source: Programmatic downloading of historical daily/monthly ZIP files from public [login to view URL] (specifically aggTrades, all klines [1m], and bookDepth for BTCUSDT, ETHUSDT, SOLUSDT, XRPUSDT, BNBUSDT). Storage Architecture: Set up a local storage solution using DuckDB or Parquet to handle millions of rows without memory issues. Alignment: Parse and align different frequencies (tick-by-tick trades, order book snapshots, and 1m klines) to a unified timestamp sequence. 2. Core Microstructure Feature Extraction Implement Python/Polars (or Pandas) scripts to compute the features on the aligned data. 3. Advanced Optimization & ML Readiness Strict Lookahead Bias Prevention: Ensure all rolling features (e.g., rolling z-scores, Parkinson volatility) are calculated using t−1 parameters to prevent data leakage. Normalization: Implement rolling z-scores or min-max normalization per symbol to keep features stationary. Labeling: Implement a basic Triple Barrier Method or directional label generator. Output: Save clean Parquet files per symbol, free of NaNs and infinite values, structured for immediate model training.
ID del proyecto: 40488333
4 propuestas
Proyecto remoto
Activo hace 7 días
Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
4 freelancers están ofertando un promedio de ₹944 INR por este trabajo

Hi there, You’re absolutely in the RIGHT PLACE. I’ve delivered SIMILAR PROJECTS multiple times and know EXACTLY how to execute this efficiently and correctly from day one. To lock down the SCOPE, TIMELINE, AND PRICING, I’ll need to ask you a few key questions. Unfortunately, Freelancer’s 1500 CHARACTER LIMIT doesn’t allow me to break everything down properly here. Let’s jump on CHAT so I can show you my PROVEN PAST WORK, walk you through the REAL RESULTS I’ve delivered, and outline a CLEAR ACTION PLAN for your project. You’ll immediately see why my approach is DIFFERENT and EFFECTIVE. If you’re serious about getting this done RIGHT, I’m ready to move forward. Looking forward to CONNECTING and WINNING TOGETHER. Cheers, Mayank Sahu
₹1.050 INR en 7 días
0,0
0,0

Building a robust local pipeline for processing Binance Futures historical data requires precise handling of ingestion and feature extraction to mitigate data leakage risks. By utilizing DuckDB for local storage, you can efficiently manage millions of rows while ensuring seamless alignment of tick-by-tick trades, order book snapshots, and 1m klines into a unified timestamp sequence. The implementation of Python/Polars allows for effective computation of rolling features with strict lookahead bias prevention. I can deliver an initial proof of concept within 15 days. When can we start? I can have something to show you within 24 hours.
₹925 INR en 9 días
0,0
0,0

Lucknow, India
Miembro desde sept 25, 2021
₹600-1500 INR
₹1500-12500 INR
₹600-1500 INR
₹600-1500 INR
₹1500-12500 INR
₹600-1500 INR
$8-15 USD /hora
₹1000000-2500000 INR
₹600-1500 INR
$250-750 AUD
$50-100 USD
₹12500-37500 INR
₹600-1500 INR
₹600-1500 INR
$10-50 USD
₹600-1500 INR
₹600-3000 INR
₹1500-12500 INR
$15-25 USD /hora
$10-30 AUD
₹12500-37500 INR
$250-750 USD
₹12500-37500 INR
$8-15 USD /hora
£30-51 GBP