Data Manipulation, Pandas and Distance Metrics -- 2
$10-35 USD
Cancelado
Publicado hace más de 7 años
$10-35 USD
Pagado a la entrega
Write the Python code to load the CSV file from spatialkey directly into a Pandas dataframe.
• You may need to familiarize yourself with the IO Tools functions of Pandas.
• Turn in the fragment of code that does this.
§ Compute the price per square foot for all properties. Add this data back into the dataframe.
• See the documentation on concatening objects.
• Turn in the code that does this, and show the results of head() and tail() on the new dataframe.
§ Provide three data points from the original dataset that give your reason to be concerned about the some of the properties with unusual price per square foot data.
• Learn about summarizing your data and sorting.
• Turn in the rows of those three data points and 1 to 3 sentences discussing your concerns.
§ Use a scatter plot to plot the sale price to the square feet.
§ Do the same for number of beds to sale price.
• You will need to learn about scatter plots to complete this.
• For both, turn in the scatter plot &#emdash; if you're using Jupyter, just leave the data inline.
§ Explore the distribution of properties by number of beds. Plot this using plot() (learn about that function here).
PART 2 / Distance metrics and k-Nearest Neighbor
§ Implement the k-Nearest Neighbor algorithm in Python using the Euclidean distance metric.
• Turn on the code for this implementation in Python.
• You implementation will take two vectors and return a single number. Use the template :
def my_knn(vec1, vec2): # your implementation return d # the_euclidean_distance of vec1 and vec2
§ Produce the distance table using your version of kNN for all properties.
• To make things easier, please reduce the data vector to just beds, bath, square footage, price, latitude and longitude.
• You can use the street as the index. Your final output will look something like this: