ARTS 2.0 第 3 周

每周完成一个 ARTS:
Algorithm: 每周至少做一个 LeetCode 的算法题
Review: 阅读并点评至少一篇英文技术文章
Tips: 学习至少一个技术技巧
Share: 分享一篇有观点和思考的技术文章

Contents

Algorithm

108. Convert Sorted Array to Binary Search Tree

题目:108. Convert Sorted Array to Binary Search Tree

难度:Easy

题意:

Given an array where elements are sorted in ascending order, convert it to a height balanced BST.

For this problem, a height-balanced binary tree is defined as a binary tree in which the depth of the two subtrees of every node never differ by more than 1.

示例:

Given the sorted array: [-10,-3,0,5,9],

One possible answer is: [0,-3,9,-10,null,5], which represents the following height balanced BST:

      0
     / \
   -3   9
   /   /
 -10  5

解法:

本题要求将一个有序数组转换成 BST。由于是有序数组,那么将这个数组从中间分开,将中间的元素作为根节点,左右两边分别作为左右叶子节点,就是一个基本的思路。接下来分别对左右两边的子数组做相同的递归操作即可。

代码:

# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right
class Solution:
    def sortedArrayToBST(self, nums: List[int]) -> TreeNode:
        if len(nums) == 0:
            return None

        mid = len(nums) // 2
        root = TreeNode(nums[mid])
        root.left = self.sortedArrayToBST(nums[:mid])
        root.right = self.sortedArrayToBST(nums[mid+1:])

        return root

Review

6 Types of Neural Networks Every Data Scientist Must Know

本周的文章推荐是《6 Types of Neural Networks Every Data Scientist Must Know》

本文介绍了 6 种常见的神经网络模型,从基本的感知机模型到当下火热的生成对抗网络,涵盖了他们各自的基本原理和应用场景。

Tips

使用 tree 输出文件结构树形文本图

在 macOS 或者 Linux 中,可以使用 tree 这个 命令行工具生成文件结构树形文本图,举个栗子🌰:

我的 GitHub Pages 结构

你可以使用 Homebrew 或者 apt/yum/pacman 来安装这个工具:

# macOS
brew install tree
# Ubuntu
sudo apt install tree

几个常用的命令如下:

针对中文目录和文件的乱码问题,可以在命令参数中添加 -N 来解决:tree -N -d

Share

为什么要使用不带正则化项的代价函数来绘制学习曲线?

前几天在 Stack Overflow 上看到一个问题:

For a given regularization parameter and set of features, create differently sized subsets of your training data. For each training data subset, using regularization,

Once that’s done, plot the unregularized cost for both training and validation sets as a function of the size of the training data subset. The idea is that, if the training error and validation error remain very different at large training set sample sizes then the model has high variance. If training error and validation error converge too quickly then the model has high bias. My question is… Since the models we’re testing were calculated using a regularization constant, why aren’t we plotting regularized cost as a function of the training data size?

在观察训练误差和验证误差随训练集大小变化时,可以使用绘制学习曲线的方式,而我们计算这两个误差时,使用的是不带正则化项的代价函数。那么问题是,为什么要使用不带正则化项的代价函数来绘制学习曲线呢?

原因是这样的:正则化项是为了给机器学习算法的参数添加 penalty 的,当我们计算好一组参数之后,再想要度量算法的好坏程度,就应该使用真正的代价函数,即没有任何 penalty 的 error,这也就是为什么在绘制学习曲线时不需要使用正则化项的原因。

WeChat QRCode