CSCI 333 Final Project 100 points + 10 bonus points Note: This is an individual assignment. Eac


CSCI 333 Final Project

100 points + 10 bonus points

Don't use plagiarized sources. Get Your Custom Essay on
CSCI 333 Final Project 100 points + 10 bonus points Note: This is an individual assignment. Eac
Just from $10/Page
Order Essay

Note: This is an individual assignment. Each student MUST complete the work on his/her own.
Any code sharing/plagiarism is not tolerated.


This project consists of three tasks. The goal is to apply what we have learned to solve real problems
in Data Science and Machine Learning. Glance at “What to Submit” when you start working on a
task so that you know what information to provide from each task.

Submission Example






What to Submit

1. One doc file “csci333-project-XX.doc” including the text source code and screenshots of the
outputs of all programs. Please replace XX with your first name and last name. You can
copy/paste the text source code from Pycharm or other IDEs into the doc file. Hopefully,
based on the screen snapshots of the output, you can show that your programs passed tests
and were well.

2. Python files for all programs. In well-defined programs, proper comments are required. For
programs without comments, they will be deducted greatly in grade.

3. Note that if any program or code does not work, you can explain the status of the program or
code and then attach your explanation and description in a file “README.txt”.

4. Optional. Anything you want to attract the attention of instructor in grading.

Task 1 (20 points): (Intro to Data Structure and Data Science: Survey Response Statistics) Write
a program that create, calculate, and display the survey Response. Five hundred (500) people
were asked to quantify their pain by using a numerical rating scale (NRS) from 0 to 10. Zero
means no pain; one to three (inclusive) means mild pain; four to six means moderate pain;
seven to nine means severe pain; and ten mean the worst pain, as shown in Fig. 1.

Based on the pain scale, write a program by performing the following subtasks:

Perform the following subtasks:

(a) Create a patientlist variable to select 500 names sequentially from the file ”namelist.txt”
based on your last four digits of CWID. For example, the last four digits of your CWID are 5678.
The first 5 names are “Suzy” in the line 5678, “Suzannah”,“Sully”,“Sulema”, and “Sueann”.

Figure 1: Pain Scale

(b) Use random.randint() or numpy.random.randint() to generate 500 responses for 500 patients
in a list.
(c) Create a function to include (a) and (b) and create a file “patientList.txt” by saving the
patients from the paitentlist and responses from the responselist in the format “patient” and
“response” per line. For instance, “Suzy 9”.

1Suzy 9
2Suzannah 6
3Sully 2
4… …

(d) Determine and display the frequency of each pain value i from 0 to 10.

(e) Use the built-in functions, statistics module functions and NumPy or Panda functions cov-
ered in the course materials to display the following response statistics: minimum, maximum,
range, mean, median, variance and standard deviation.

(f) Display a bar chart showing the response frequencies and their percentages of the total
responses. The x-axis should show 11 pain values while the y-axis should show each pain
value’s relative frequency in %.

(g) Test your function and display each pain with its relative frequency.

Grading Rubric

– 5 points for defining functions.

– 5 points for finishing Task1(a)-(g).

– 5 points for a runnable python program with correct data visualization.

– 5 points for appropriate comments and screenshots of the program

Task 2 (30 points): (Intro to Data Science: Pandas-dateframes) Write a program that does
the following tasks with pandas DataFrames (as shown in the slides ”09-02-Data Science.pdf”):

(a) Create a dictionary “patients” by reading all patients from the file “patientList.txt” created
by Task 1.
(b) Create a DataFrame named patientData from a dictionary “patients”. (c) Recreate the
DataFrame patientData in Part (a) with custom indices using the ‘Name keyword argument
and ‘Pain’.
(d) Select from patientData the column of temperature readings for ‘Name’.
(e) Select from patientData the row of ‘Pain’ readings.

(f) Based on pain values, insert a new column ”Level” with 5 possible values “No pain”, “Mild”,
“Moderate”, “Severe”, or “Worst” by referring to Figure 1. For each patient, specify its level
based on the pain value. For instance, if the pain value is 9, the level should be “Severe”.
(g) Use the describe() method to produce patientData’s descriptive statistics.
(h) Transpose patientData (One example can be found at
(i) Display a bar chart showing the pain level frequencies and their percentages of the total
number of records. The x-axis should show 5 levels while the y-axis should show each pain
level’s relative frequency in %.

Grading Rubric

– 10 points for defining functions.

– 5 points for finishing Task2(a)-(i).

– 5 points for appropriate comments and necessary screenshots of the program.

– 10 points for a runnable python program with correct data visualization.

Task 3 (50 points): (Classification with k-Nearest Neighbors and the Digits Dataset) Read the file
“09-02-MachineLearning-Long.pdf” and the python program “” to learn the
algorithm of k-Nearest Neighbors with the Digits dataset for recognizing handwritten digits.

Re-write the python program by doing the following subtasks:

(a) Write code to display the two-dimensional array representing the sample image at index
XY (where XY are the last two digits of your TAMUC CWID) and the numeric value of the
digit the image represents.

(b) Write code to display the image for the sample image at index XY of the Digits dataset.

(c) For the Digits dataset, what the number of samples would the following statement reserve
for training and testing purposes?

1X train, X test, y train, y test = train test split(,, random state=11, test size=(XY%10)/100)

(d) Write code to get and display the number of training examples and the number of testing

(e) Using the predicted and expected arrays, calculate and display the prediction accuracy

(f) Display and explain th Xth row of the confusion matrix presented in the example we have
studied in the “Intro-to-MachineLearning-Part-II.mp4”, where X is the last digit of your CWID.

(g) Rewrite the list comprehension in snippet [34] using a for loop. Hint: create an empty list
and then use the built-in function “append”.

1# In[34]:
2names = [str(digit) for digit in names]

Grading Rubric

– 15 points for finishing Task3(a)-(g).

– 5 points for appropriate comments.

– 20 points for a runnable rewritten python program

– 10 points for screen-shots of the program.

Challenges in This Project

1. For 10% extra credit, you are welcome to explore the design of each task. Note: You still have
to finish all tasks required by this project.

2. You should configure your machine and PyCharm properly to facilitate the project develop-

Reference: [1] Computer Science. science

—————x———— Good Luck ————x————–

Place your order
(550 words)

Approximate price: $22

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
The price is based on these factors:
Academic level
Number of pages
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more

Order your essay today and save 30% with the discount code ESSAYSHELP