Write Python commands to Select a subset of the dataset and clean it a) [10 points] Create boolean mask and

Write Python commands to Select a subset of the dataset and clean it a) [10 points] Create boolean mask and

Question:

Write Python commands to Select a subset of the dataset and clean ita) [10 points] Create  boolean mask and apply it to the CarsData dataframe to select the data of all sport cars (i.e.,’Sports Car?’ column whith value =1) that has hoorsepower >=350. Name the resulted dataframe as SportCarsb) [10 points] There are missing values in the ‘City Miles Per Gallon’ and ‘Highway Miles Per Gallon’ columns in the dataframe SportCars (result of task 1.a). Write code to identify the location of these missing values, then replace the missing values with the minimum value in their corresponding columns. i.e. replace the missing value(s) in the City Miles Per Galon column with the minimum value in that column of the SportCars (not the entire dataset). Similarily the missing value(s) in ‘Highway Miles Per Gallon’ is to be replaced by the minimum value in the ‘Highway Miles Per Gallon’ column.c) [5 points] From SportCars dataframe remove the column ‘Sports Car?’ and every column that has 0’s (zeros) in ALL of its values. [2 extra points] for compact code to identify columns with 0’sd) [10 points] Add new column to SportCars labeled ScaledCityMPG which is calculated by normalizing (scaling) the City Miles Per Galon column to values in the range [0,1]  Extract some statistical data:Write Python commands to Statistically describe the selected SportCars data using the following columns [ ‘Suggested Retail Price’, ‘Engine Size’, ‘Number of Cylinders’, ‘Horsepower’, ‘City Miles Per Gallon’, ‘Highway Miles Per Gallon’, ‘Weight’]a) [5 points] Show descriptive statstics table of the data using all ordinal columns aboveb) [5 points] Show and plot the correlation matrix of the selected data in(SportCars)c) [5 points] Plot the scatter matrix of the selected datad) [5 points] From the stats and the two plots in (b and c) above describe the following:relationships (i.e. what happen to the others when one increse/decrease) between MPG (city or highway) and each of the following:Number of Cylinders, Suggested Retail Price, horsepower, weight, and Engine SizePairwise Correlation (negative, positive, or no correlation) between Engine Size, Number of Cylinders, horsepower, and weightMake a bar chart using a subset of the data:Using the SportCars dataframe created in task 1 Write Python commands for each of the follwoing:a) [5 points] create an array named mpgColors of 40 colors using the Greens colormap and map the colors to the values of scaledCityMPG columnb) [35 points] make a bar plot using the column ‘City Miles Per Gallon’ to create visual comparison between the selected set of cars in SportCars as follows [5 points each] :i) set the plot figure size to 10 x 4 and use .bar() method to make the plotii) plot car names as x-axis vs their City Mpg as y-axisHint: Use a list/array of integers as x-axis instead of directly using the car name columniii) let the space between the columns to be 30% of the space of each bar (hint: what would be the bar width?)iv) Set the edge color of the bars as Green and the inside color from the colors mpgColors you created in subtask (3.a) above.v) make the x-axis ticks label as the car names (Vehicle Name), and make them in the middle of the bars, and rotate the ticks labels 90 degreesvi) add x-axis label as “Sport car make and model”, y-axis label as “City MPG”vii) Add plot title as “City MPG: Sport cars with 350 or more Horsepower”c) [5 points] Save the plot in an image file as “CityMPG-Hp350plus-SportCars.png” .

Expert Answer:

Answer rating: 100% (QA)

Certainly Here are Python commands to perform the tasks you ve described a Create a boolean mask and apply it to select sport cars with horsepower 350
View the full answer