Given csv file:
IP,name,Port,age,year,city,state,zip
IP,name,Port,age,year,city,state,zip
IP,name,Port,age,year,city,state,zip
x100,000
I would need a multi-threaded python script that goes through each csv line, grab the IP and port on each line, and scrape the TITLE of each webpage. (Each IP address with the port links to a website).
After it grabs the title, It would need to print the results in a new CSV file like this:
IP,name,Port,age,year,city,state,zip,TITLE
IP,name,Port,age,year,city,state,zip,TITLE
IP,name,Port,age,year,city,state,zip,TITLE
There are around 100,000 ips total I would need to get through, hence the multi-threaded code. The next issue is that some of the websites load javascript that will redirect to another directory in the website. In this case you would need to use SELENIUM Headless or something alike to load the website and let it do all it’s redirects and than grab the final page TITLE. Please don't rely on 302 for redirects, some of the websites will load a 200 with a javascript code to redirect which a 302 response code wouldn't catch. If you know how to scrape with selenium than you know what i'm talking about.
To prevent the code from running for hours we’ll need to setup a timeout, if a website doesn’t respond in say 12 seconds, print that ip and port to another file.
Also, for each IP, I’ll need to check both HTTP and HTTPS results. If HTTP doesn’t load a title or timeouts, check HTTPS. Vice Versa.
Please only bid if you are capable of completing the project fully.