吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第四周所有jupyter notebook文件:
吴恩达,机器学习专项课程, Advanced Learning Algorithms第四周所有Python编程文件
本次作业
Exercise 1
UNQ_C1
# GRADED FUNCTION: compute_entropy
def compute_entropy(y):
"""
Computes the entropy for
Args:
y (ndarray): Numpy array indicating whether each example at a node is
edible (`1`) or poisonous (`0`)
Returns:
entropy (float): Entropy at that node
"""
# You need to return the following variables correctly
entropy = 0.
### START CODE HERE ###
if len(y) != 0:
p1 = 1 - sum(y)/len(y)
if p1 == 0 or p1 == 1:
entropy = 0.
else:
entropy = -p1*np.log2(p1)-(1-p1)*np.log2(1-p1)
else:
entropy = 0.
### END CODE HERE ###
return entropy
Exercise 2
# UNQ_C2
# GRADED FUNCTION: split_dataset
def split_dataset(X, node_indices, feature):
"""
Splits the data at the given node into
left and right branches
Args:
X (ndarray): Data matrix of shape(n_samples, n_features)
node_indices (ndarray): List containing the active indices. I.e, the samples being considered at this step.
feature (int): Index of feature to split on
Returns:
left_indices (ndarray): Indices with feature value == 1
right_indices (ndarray): Indices with feature value == 0
"""
# You need to return the following variables correctly
left_indices = []
right_indices = []
### START CODE HERE ###
for i in range(len(X)):
if i in node_indices:
if X[i][feature] ==1:
left_indices.append(i)
else:
right_indices.append(i)
### END CODE HERE ###
return left_indices, right_indices
Exercise 3
# UNQ_C3
# GRADED FUNCTION: compute_information_gain
def compute_information_gain(X, y, node_indices, feature):
"""
Compute the information of splitting the node on a given feature
Args:
X (ndarray): Data matrix of shape(n_samples, n_features)
y (array like): list or ndarray with n_samples containing the target variable
node_indices (ndarray): List containing the active indices. I.e, the samples being considered in this step.
Returns:
cost (float): Cost computed
"""
# Split dataset
left_indices, right_indices = split_dataset(X, node_indices, feature)
# Some useful variables
X_node, y_node = X[node_indices], y[node_indices]
X_left, y_left = X[left_indices], y[left_indices]
X_right, y_right = X[right_indices], y[right_indices]
# You need to return the following variables correctly
information_gain = 0
### START CODE HERE ###
# Weights
wl = len(left_indices) / len(node_indices)
wr = len(right_indices) / len(node_indices)
#Weighted entropy
#Information gain
information_gain = compute_entropy(y_node)-(wl*compute_entropy(y_left)+wr*compute_entropy(y_right))
### END CODE HERE ###
return information_gain
Exercise 4
# UNQ_C4
# GRADED FUNCTION: get_best_split
def get_best_split(X, y, node_indices):
"""
Returns the optimal feature and threshold value
to split the node data
Args:
X (ndarray): Data matrix of shape(n_samples, n_features)
y (array like): list or ndarray with n_samples containing the target variable
node_indices (ndarray): List containing the active indices. I.e, the samples being considered in this step.
Returns:
best_feature (int): The index of the best feature to split
"""
# Some useful variables
num_features = X.shape[1]
# You need to return the following variables correctly
best_feature = -1
### START CODE HERE ###
in_gain = []
if sum(y) != 0 and sum(y) != len(y):
for i in range(len(X[0])):
in_gain.append(compute_information_gain(X, y, node_indices, i))
best_feature = in_gain.index(max(in_gain))
else:
best_feature = -1
### END CODE HERE ##
return best_feature
作者:楚千羽
本文作者:楚千羽
本文版权归作者共有,欢迎转载,但未经作者同意必须在文章页面给出原文连接,否则保留追究法律责任的权利!