# Adv. Data Science Using R | Assignment – 3

## Quiz – 3

Q1. Which of the following language is used in Data science?
R
C
C++
Ruby

Q2. What is the primary file type of R?
Vector
Text file
RScripts
Statistical file

Q3. Which one of the following R packages is used for data products?
haven
igraph
slidify
forecast

Q4. Which of the following is valid for checking categorical variable?
Level
Table
Unique
All of the above

Q5. Suppose ABC is the matrix of 3 rows and 4 columns. Choose correct option(s) to rename columns:
row_names(ABC)= c(“row1”,”row2”,”row3”)
rownames(ABC)=c(“row1”,”row2”)
row(ABC)=c(“row1”,”row2”)
rownames(ABC)=c(“row”,”row2”,”row3”)

Q6. Arrange in proper order of data type:
Logical, integer, numeric, character
Integer, numeric, character, logical
Character, logical, integer, numeric
Numeric, integer, character, logical

Q7. What is the output of below code:
A=10
B=20
print(A,B)

10 20
Error
(10, 20)
None of the above

Q8. Return statement is compulsory while writing function in R
True
False

Q9. Last variable in function is by default return variable in R
True
False

Q10. What package is need to be install for reading?

Q11. what is the output of below mentioned code?
logic1=c(T,F,F,T,F,T)
print(which (logic))

1 4 6
2 3 6
6 4 1
1 2 3

Q12. If A = c (1, 13, 42, 13, 4)  then what is A = A [ -4 ]?
1, 13, 42, 4
1, 13, 42, 13
13
1, 42, 13, 4

Q13. what function can be used to split the string?
Output will be : “Navin”      “Mr. Naresh J”

strsplit(name,”[.]”)
charsplit(name,”[,]”)
stringsplit(name)
strsplit(name,”[,]”)

Q14. i=100 , how to find out data type of i
Option 1
type(i)
class(i)
none of the above

Q15. Dt = “01-12-2020” is in the form of character. What is the option to convert date into “MM-DD-YYYY”
To_date (dt, ”MM – DD – YYYY”)
date( x = dt, format = “%m / %d / %Y”)
Date ( x = dt, format = “%m / %d / %Y”)
none of the above

## Assignment – 3

1. What Is KNN Algorithm? Features Of KNN Algorithm. How Does KNN Algorithm Work? Write KNN algorithm pseudocode and Practical Implementation Of KNN Algorithm In R.

The K-Nearest Neighbors (KNN) algorithm is a non-parametric, instance-based method for classification and regression. It is a supervised learning algorithm that stores all available cases and classifies new cases based on a similarity measure, such as Euclidean distance.

Features of KNN Algorithm:

1. Simple to understand and implement
2. No assumptions about the distribution of the data
3. Can be used for both classification and regression problems

The algorithm works by taking a new data point and finding the k number of closest points in the training set. The new data point is then classified by the majority class of the k nearest neighbors.

Pseudocode for the KNN algorithm:

1. Initialize the number of nearest neighbors (k)
2. For each point in the dataset: a. Calculate the distance between the point and the new data point b. Add the distance and the point to a list
3. Sort the list by distance
4. Take the first k elements from the sorted list
5. Determine the majority class among the k elements
6. Classify the new data point as the majority class

Practical Implementation of KNN Algorithm in R:

``````# Load the library
library(class)

# Create a sample data set
x <- cbind(rnorm(50), rnorm(50))
y <- gl(2, 25, labels = c("A", "B"))

# Fit a KNN model with k = 3
fit <- knn(x, x, y, k = 3)

# Predict the class of new data points
newdata <- rbind(c(1, 2), c(3, 4))
predicted_class <- predict(fit, newdata)
``````

This code creates a sample dataset of 50 points, with two features and two classes (A and B). Then it fits a KNN model with k = 3, and predicts the class of two new data points.

2. Develop a Machine Learning Model using SVM in R to solve A Business Problem. Add Screenshots of the graphs and code to validate your answer.

Applying SVM for solving a Business use Case

Read the data and check the structure of both train and test

``````library(lubridate)
library(caret)
library(dplyr)
library(DMwR)
library(ROSE)
library(ggplot2)
library(randomForest)
library(rpart)
library(rpart.plot)
library(data.table)
library(e1071)
library(gridExtra)

data.table = FALSE)
test <-fread('../input/test.csv', stringsAsFactors = FALSE, data.table
= FALSE)
str(train)``````
``str(test)``

There is no difference between a train and test data except we need to
predict target (is_attributed) in test and attributed_time (Time taken
Missing value checking and estimation

``colSums(is.na(train))``

There is no missing value at all, data is very clean and clear

``colSums(train=='')``

is logically correct
Lets check the target variable how many are not downloaded in train data

``table(train\$is_attributed)``

Our assumption is correct since blank entries in Attributes_time is
As it’s logically correct, we don’t need to do any further action on this
And also notice that, this variable is not present in test data, so no point
of keeping it in the train data too

``train\$attributed_time=NULL``

Applying the SVM on the data
Linear Support Vector Machine

Before going into model, lets tune the cost Parameter

``````set.seed(1234)
liner.tune=tune.svm(is_attributed~.,data=smote_train,kernel="linear",cost=c
(0.1,0.5,1,5,10,50))
liner.tune``````

cross validation method

Lets see how our Linear model works

Lets get a best.liner model

``best.linear=liner.tune\$best.model``

#Predict data

``````best.test=predict(best.linear,newdata=test_val,type="class")
confusionMatrix(best.test,test_val\$is_attributed)``````

The Kernel Trick – Radial Support vector Machine

``````set.seed(1234)
seq(0.1,5))
summary(rd.poly)``````

Lets predict the test data

``````best.rd=rd.poly\$best.model
pre.rd=predict(best.rd,newdata = test_val)
confusionMatrix(pre.rd,test_val\$is_attributed)``````

3. Write down the step by step classification of naïve bayes classification in R.

The step by step classification of naïve bayes classification in R are as Follows:

1. Load the necessary libraries, such as the “e1071” library for the Naive Bayes classifier.
2. Prepare the data for the model. This includes splitting the data into a training and test set, and converting any categorical variables into factors.
3. Train the model by fitting the training data to the Naive Bayes classifier.
4. Use the trained model to predict the class of the test data.
5. Evaluate the performance of the model by comparing the predicted class to the actual class of the test data. This can be done using metrics such as accuracy, precision, and recall.
6. Repeat steps 3-5 for different input data and/or different model parameters to find the best model for the given data.

3
0