The assignment guides you through the analysis of a data set using the techniques and tools provided in the course. This part of the assignment tests the understanding of the material in lectures. It is necessary to follow the assignment in the given order since the result of some questions might depend on answers to previous steps. The questions are detailed in the provided jupyter notebook.
Dataset: Use the provided data set “population_density.csv”. The data set has the following attributes:
You should pass some steps before starting the assignment as preprocessing steps. The details of preprocessing steps are mentioned in the jupyter notebook file. After passing these preprocessing steps, export your final dataset as ‘population_density_categorical.csv’ dataset and use that for the next stages of the assignment.
Your submission should include
o a jupyter notebook which presents your results and also contains the python code used to obtain the results. Next to this jupyter notebook, upload a zipfile that contains all requested data sets, including the extracted dataset (.csv) based on your student number (see the jupyter notebook).
Report requirements:
You are allowed to upload 3 separate items via Blackboard:
[1]. Jupyter notebook.
[2]. datasets.zip including all the requested data sets.
[3]. In the cases that the result of an algorithm is pdf, jpg, etc, you should attach the result to this notebook file and refer to that in the text.
The grade of the assignment counts 5% towards the final grade. In this first part of the assignment, 100 points are obtainable, 90 points for the seven main sections and 10 points related to your report style:
As a data scientist, adequately presenting your results is just as important as what you have done, therefore, 10 points are obtainable for report style. Please note that correctness of your code, its result and also the accuracy of your explanation are important.