
Cerrado
Publicado
We are looking for a Python developer with experience to build a robust, local pipeline that processes Binance Futures historical data into an ML-ready dataset. The goal is to ingest public data from Binance Vision (aggTrades, all klines, and bookDepth) and output clean, normalized, lookahead-bias-free features stored in Parquet format or DuckDB. Scope of Work & Deliverables 1. Ingestion & Database Setup (Core Foundation) Data Source: Programmatic downloading of historical daily/monthly ZIP files from public [login to view URL] (specifically aggTrades, all klines [1m], and bookDepth for BTCUSDT, ETHUSDT, SOLUSDT, XRPUSDT, BNBUSDT). Storage Architecture: Set up a local storage solution using DuckDB or Parquet to handle millions of rows without memory issues. Alignment: Parse and align different frequencies (tick-by-tick trades, order book snapshots, and 1m klines) to a unified timestamp sequence. 2. Core Microstructure Feature Extraction Implement Python/Polars (or Pandas) scripts to compute the features on the aligned data. 3. Advanced Optimization & ML Readiness Strict Lookahead Bias Prevention: Ensure all rolling features (e.g., rolling z-scores, Parkinson volatility) are calculated using t−1 parameters to prevent data leakage. Normalization: Implement rolling z-scores or min-max normalization per symbol to keep features stationary. Labeling: Implement a basic Triple Barrier Method or directional label generator. Output: Save clean Parquet files per symbol, free of NaNs and infinite values, structured for immediate model training.
ID del proyecto: 40488721
7 propuestas
Proyecto remoto
Activo hace 4 horas
Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
7 freelancers están ofertando un promedio de ₹1.000 INR /hora por este trabajo

Hello, I trust you're doing well. I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various artificial intelligence algorithms, including the one you require, using Matlab, Python, and similar tools. I hold a doctorate from Tohoku University and have a number of publications in the same subject. My portfolio, which showcases my past work, is available for your review. Your project piqued my interest, and I would be delighted to be part of it. Let's connect to discuss in detail. Warm regards. please check my portfolio link: https://www.freelancer.com/u/sajjadtaghvaeifr
₹1.500 INR en 40 días
5,7
5,7

As an experienced data analyst and scientist with a deep understanding of Python, I am perfectly equipped to deliver you a robust and efficient pipeline for processing Binance data à la your requirements. My 8+ years in the field have familiarized me with working on complex datasets and implementing advanced analytics techniques essential for this project. My expertise extends from core data handling - including setting up local storage solutions, aligning different frequencies of data, to microstructure feature extraction, and advanced optimization for ML readiness. I excel at preventing lookahead biases, normalizing features, labeling, and ensuring outputs that are clean, reliable-as-an-ox Parquet files promptly available for ML model training. Having successfully rolled out solutions across finance amongst other industries - à la businesses dependent on data management and process optimization - I've honed my abilities to yield accurate results amidst tight deadlines. With me at the helm of your project, you can rest assured about quality, efficiency, and unlocking the full potential of your Binance data. Let's embark on this data-driven journey together.
₹1.000 INR en 40 días
3,8
3,8

Addressing the core challenge of efficiently processing and normalizing large datasets without introducing lookahead bias begins with a robust ingestion and database setup. Programmatic downloads of Binance's historical data can be executed seamlessly with the appropriate use of Python libraries, leveraging DuckDB for its high-performance data handling capabilities. Prioritizing the alignment of varying data frequencies into a unified timestamp sequence is crucial for accurate feature extraction. I will implement advanced statistical techniques using Polars for processing and ensure the extracted features are immediately ML-ready. The initial deliverable will be ready in 14 days. Ready to kick this off, what's the best way to get started?
₹800 INR en 40 días
0,0
0,0

I see you need a robust Python pipeline to process Binance Futures historical data into an ML-ready format. I'd build this using Python scripts to ingest and process data from Binance Vision, ensuring clean, normalized features stored in Parquet format. This will allow you to efficiently analyze and model data for better decision-making. I've worked with similar data pipelines for finance and trading industries, ensuring accurate results. Quick question: How soon do you need this pipeline up and running? Regards, Collen Jr Liebenberg
₹750 INR en 7 días
0,0
0,0

⚡️ONLY PAY IF YOU’RE IMPRESSED⚡️ I have extensive experience building data pipelines for financial and ML applications, including ingesting and processing time-series data like Binance Futures historical data. I can help by delivering a robust, efficient pipeline that produces clean, normalized, lookahead-bias-free datasets ready for modeling. Core Deliverables:➡️ - Programmatic data ingestion from Binance Vision - Storage setup using DuckDB/Parquet for scalability - Multi-frequency data alignment with unified timestamps - Feature extraction with bias prevention - Normalized, labeled outputs in Parquet format Approach:➡️ - Use Python and Polars for efficient processing - Strict validation to avoid data leakage - Modular, reproducible code for transparency Committed to delivering a high-quality product aligned with your goals. Looking forward to discussing this further. Kind regards, Aaron Roberts
₹950 INR en 30 días
0,0
0,0

Hisar, India
Miembro desde jun 3, 2026
₹12500-37500 INR
₹1500-12500 INR
₹600-1500 INR
$1500-3000 USD
₹75000-150000 INR
$30-250 USD
₹750-1250 INR /hora
$10-50 USD
₹12500-37500 INR
$8-15 USD /hora
mín ₹2500 INR /hora
₹12500-37500 INR
₹600-1500 INR
₹600-1500 INR
₹12500-37500 INR
₹750-1250 INR /hora
₹750-1250 INR /hora
$250-750 USD
$250-750 USD
$30-250 USD
₹12500-37500 INR