Adv. Data Science Using R | Assignment – 3
Quiz – 3
Q1. Which of the following language is used in Data science?
R
C
C++
Ruby
Q2. What is the primary file type of R?
Vector
Text file
RScripts
Statistical file
Q3. Which one of the following R packages is used for data products?
haven
igraph
slidify
forecast
Q4. Which of the following is valid for checking categorical variable?
Level
Table
Unique
All of the above
Q5. Suppose ABC is the matrix of 3 rows and 4 columns. Choose correct option(s) to rename columns:
row_names(ABC)= c(“row1”,”row2”,”row3”)
rownames(ABC)=c(“row1”,”row2”)
row(ABC)=c(“row1”,”row2”)
rownames(ABC)=c(“row”,”row2”,”row3”)
Q6. Arrange in proper order of data type:
Logical, integer, numeric, character
Integer, numeric, character, logical
Character, logical, integer, numeric
Numeric, integer, character, logical
Q7. What is the output of below code:
A=10
B=20
print(A,B)
10 20
Error
(10, 20)
None of the above
Q8. Return statement is compulsory while writing function in R
True
False
Q9. Last variable in function is by default return variable in R
True
False
Q10. What package is need to be install for reading?
Read_excel
Readxl
Readcsv
read_csv
Q11. what is the output of below mentioned code?
logic1=c(T,F,F,T,F,T)
print(which (logic))
1 4 6
2 3 6
6 4 1
1 2 3
Q12. If A = c (1, 13, 42, 13, 4) then what is A = A [ -4 ]?
1, 13, 42, 4
1, 13, 42, 13
13
1, 42, 13, 4
Q13. what function can be used to split the string?
Output will be : “Navin” “Mr. Naresh J”
strsplit(name,”[.]”)
charsplit(name,”[,]”)
stringsplit(name)
strsplit(name,”[,]”)
Q14. i=100 , how to find out data type of i
Option 1
type(i)
class(i)
none of the above
Q15. Dt = “01-12-2020” is in the form of character. What is the option to convert date into “MM-DD-YYYY”
To_date (dt, ”MM – DD – YYYY”)
date( x = dt, format = “%m / %d / %Y”)
Date ( x = dt, format = “%m / %d / %Y”)
none of the above
Assignment – 3
1. What Is KNN Algorithm? Features Of KNN Algorithm. How Does KNN Algorithm Work? Write KNN algorithm pseudocode and Practical Implementation Of KNN Algorithm In R.
The K-Nearest Neighbors (KNN) algorithm is a non-parametric, instance-based method for classification and regression. It is a supervised learning algorithm that stores all available cases and classifies new cases based on a similarity measure, such as Euclidean distance.
Features of KNN Algorithm:
- Simple to understand and implement
- No assumptions about the distribution of the data
- Can be used for both classification and regression problems
The algorithm works by taking a new data point and finding the k number of closest points in the training set. The new data point is then classified by the majority class of the k nearest neighbors.
Pseudocode for the KNN algorithm:
- Initialize the number of nearest neighbors (k)
- For each point in the dataset: a. Calculate the distance between the point and the new data point b. Add the distance and the point to a list
- Sort the list by distance
- Take the first k elements from the sorted list
- Determine the majority class among the k elements
- Classify the new data point as the majority class
Practical Implementation of KNN Algorithm in R:
# Load the library
library(class)
# Create a sample data set
x <- cbind(rnorm(50), rnorm(50))
y <- gl(2, 25, labels = c("A", "B"))
# Fit a KNN model with k = 3
fit <- knn(x, x, y, k = 3)
# Predict the class of new data points
newdata <- rbind(c(1, 2), c(3, 4))
predicted_class <- predict(fit, newdata)
This code creates a sample dataset of 50 points, with two features and two classes (A and B). Then it fits a KNN model with k = 3, and predicts the class of two new data points.
2. Develop a Machine Learning Model using SVM in R to solve A Business Problem. Add Screenshots of the graphs and code to validate your answer.
Applying SVM for solving a Business use Case
The data source is https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection
Read the data and check the structure of both train and test
library(lubridate)
library(caret)
library(dplyr)
library(DMwR)
library(ROSE)
library(ggplot2)
library(randomForest)
library(rpart)
library(rpart.plot)
library(data.table)
library(e1071)
library(gridExtra)
train <-fread('../input/train_sample.csv', stringsAsFactors = FALSE,
data.table = FALSE)
test <-fread('../input/test.csv', stringsAsFactors = FALSE, data.table
= FALSE)
str(train)

str(test)

There is no difference between a train and test data except we need to
predict target (is_attributed) in test and attributed_time (Time taken
to download Application) is not given in test data)
Missing value checking and estimation
colSums(is.na(train))

There is no missing value at all, data is very clean and clear
colSums(train=='')

Attributes_time (Time taken to download) having blank entries, this
is logically correct
Lets check the target variable how many are not downloaded in train data
table(train$is_attributed)

Our assumption is correct since blank entries in Attributes_time is
matching with Application not downloaded in train data.
As it’s logically correct, we don’t need to do any further action on this
And also notice that, this variable is not present in test data, so no point
of keeping it in the train data too
train$attributed_time=NULL
Applying the SVM on the data
Linear Support Vector Machine
Before going into model, lets tune the cost Parameter
set.seed(1234)
liner.tune=tune.svm(is_attributed~.,data=smote_train,kernel="linear",cost=c
(0.1,0.5,1,5,10,50))
liner.tune
We will get the best parameters for the SVM linear kernel, it uses multi-fold
cross validation method

Lets see how our Linear model works
Lets get a best.liner model
best.linear=liner.tune$best.model
#Predict data
best.test=predict(best.linear,newdata=test_val,type="class")
confusionMatrix(best.test,test_val$is_attributed)

The Kernel Trick – Radial Support vector Machine
set.seed(1234)
rd.poly=tune.svm(is_attributed~.,data=smote_train,kernel="radial",gamma=
seq(0.1,5))
summary(rd.poly)

Lets predict the test data
best.rd=rd.poly$best.model
pre.rd=predict(best.rd,newdata = test_val)
confusionMatrix(pre.rd,test_val$is_attributed)

3. Write down the step by step classification of naïve bayes classification in R.
The step by step classification of naïve bayes classification in R are as Follows:
- Load the necessary libraries, such as the “e1071” library for the Naive Bayes classifier.
- Prepare the data for the model. This includes splitting the data into a training and test set, and converting any categorical variables into factors.
- Train the model by fitting the training data to the Naive Bayes classifier.
- Use the trained model to predict the class of the test data.
- Evaluate the performance of the model by comparing the predicted class to the actual class of the test data. This can be done using metrics such as accuracy, precision, and recall.
- Repeat steps 3-5 for different input data and/or different model parameters to find the best model for the given data.
* The material and content uploaded on this website are for general information and reference purposes only and don’t copy the answers of this website to any other domain without any permission or else copyright abuse will be in action.
Please do it by your own first!
2nd ka answer hai mere pas