Problem 1

Consider the following training dataset with fifteen entries. Each entry has answers to a series of questions that ask if they liked a certain type of food, in which the participant answered (1) for yes or (0) for now. The last column(“midwest?”) is our target column, meaning that once the decision tree is built, this is the classification we are trying to guess, i.e.., if a person is from Midwest.

Create the entire decision tree for this dataset using Information Gain as the attribute selection measure. Make sure that you provide me with the entropy and the information gain for the attributes at each partitioning step and highlight attribute and its value that you chose at each step to partition the dataset like I did in the example that I provided.

Problem 2

Consider the below dataset.

 age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 31…40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31…40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31…40 medium no excellent yes 31…40 high yes fair yes >40 medium no excellent no

Classify the following data-point using Naïve Bayesian method. Show all relevant calculations similar to the example given in the slide.

X = (age > 40, Income = medium, Student = no, Credit_rating = Fair)

Problem 3

Consider the following confusion matrix

 Actual Class Predicted Class Cancer = yes Cancer = no Total Cancer = yes 90 210 300 Cancer = no 140 9560 9700 Total 230 9770 10000

Find the following:

a. Accuracy

b. Sensitivity

c. Specificity

d. Precision

e. Recall

f. F1 measure

