ARTS 第 37 周

每周完成一个 ARTS:
Algorithm: 每周至少做一个 LeetCode 的算法题
Review: 阅读并点评至少一篇英文技术文章
Tips: 学习至少一个技术技巧
Share: 分享一篇有观点和思考的技术文章


Contents


Algorithm

136. Single Number

题目:136. Single Number

难度:Easy

题意:Given a non-empty array of integers, every element appears twice except for one. Find that single one.

Note:

Your algorithm should have a linear runtime complexity. Could you implement it without using extra memory?

示例 1:

Input: [2,2,1]
Output: 1

示例 2:

Input: [4,1,2,1,2]
Output: 4

解法:

本题有多种解法,现给大家讲一种非常容易的思路。

因为本题前提限制了这个数组中每个数字都出现了 2 次,除了那个我们要找的数字。那么我们可以有以下数学方法:

2 * (a + b + c) - (a + a + b + b + c) = c

这样我们就可以找出只出现一次的那个 c 了。但这种方式中要使用的 set() 函数要求使用额外的空间,与 note 中要求的只使用 O(1) 的空间复杂度不符。代码就不写了。

还有一种方法就是使用字典,对每一个元素尝试进行 pop 操作,如果不行,就存入字典中,最后剩下的那个就是我们要找的那个数字了。

代码:

class Solution:
    def singleNumber(self, nums: List[int]) -> int:
        hash_table = {}

        for i in nums:
            try:
                hash_table.pop(i)
            except:
                hash_table[i] = 1

        return hash_table.popitem()[0]

但这种方式也同样要求使用额外存储空间。

最后一种方法非常简单,但也很难想到,只能用巧妙来形容。本题的题意是找出只出现了一次的那个数字。如果你学过数字电路,那么有一个概念你一定非常熟悉,就是「异或」。一个数字与它本身异或,结果为 0,一个数字与 0 异或,结果为它本身。根据这个思路,答案显而易见。

代码:

class Solution:
    def singleNumber(self, nums: List[int]) -> int:
        a = 0

        for i in nums:
            a ^= i

        return a

由于并没有额外申请空间,所以空间复杂度为 O(1),时间复杂度为线性 O(n)。


Review

Remember QR Codes? They’re More Powerful Than You Think

本文作者是一位老外,他来到中国,感叹于二维码在国内的使用之广泛,于是写了一篇文章列出了 16 个二维码的使用场景。在我看来,也许二维码在我们的日常生活中已经司空见惯,但由一个外国人来介绍这些使用场景,其实能从另一个不同的角度来观察我们的生活,也是一件非常有趣的事。


Tips

R 金融时间序列分析中 ARMA 模型定阶辅助工具

时间序列分析中,ARMA 模型是非常常用的一种时序建模模型,它理论简单,计算不复杂,拟合效果也很好,所以使用比较广泛。但 ARMA 模型的定阶相对比较麻烦,虽然常用的有 AIC、BIC 等信息准则方式,但手动计算还是略微复杂。为了解决这一问题,1984 年,Tsay 和 Tiao 提出了对 ARMA 定阶的辅助工具 EACF,其结果可以用与 (p, q) 有关的二维表格表示,结果包含由字母「O」组成的三角形,锐角顶点在位置。如下所示:

\[\begin{matrix} & MA \\ AR & 0 & 1 & 2 & 3 & 4 &5 & 6 & 7 \\ \hline 0 & X & X & X & X & X & X & X & X \\ 1 & X & O & O & O & O & O & O & O \\ 2 & * & X & O & O & O & O & O & O \\ 3 & * & * & X & O & O & O & O & O \\ 4 & * & * & * & X & O & O & O & O \\ 5 & * & * & * & * & X & O & O & O \end{matrix}\]

在 R 中的 TSA 包提供了 eacf(z, ar.max = 7, ma.max = 13) 函数,只需要提供待辨识的时序数据以及最大的阶数范围,即可按照上述表格确定相应阶数。但有一点要注意,本函数默认检验系数非零的检验水平是 0.05,有时不是特别容易看出锐角顶点。


Share

Time series analysis & data mining 工具(科研类)

MATLAB

System Identification Toolbox - Time Series Analysis: Analyze time series data by identifying linear and nonlinear models, including AR, ARMA, and state-space models; forecast values.

Econometrics Toolbox: Econometrics Toolbox™ provides functions for modeling economic data. You can select and estimate economic models for simulation and forecasting. For time series modeling and analysis, the toolbox includes univariate Bayesian linear regression, univariate ARIMAX/GARCH composite models with several GARCH variants, multivariate VARX models, and cointegration analysis. It also provides methods for modeling economic systems using state-space models and for estimating using the Kalman filter. You can use a variety of diagnostics for model selection, including hypothesis tests, unit root, stationarity, and structural change.

R

CRAN Task View: Time Series Analysis: Base R ships with a lot of functionality useful for time series, in particular in the stats package. This is complemented by many packages on CRAN, which are briefly summarized below. There is also a considerable overlap between the tools for time series and those in the Econometrics and Finance task views.

ggplot2: ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

Python

Statistics: This module provides functions for calculating mathematical statistics of numeric (Real-valued) data. The module is not intended to be a competitor to third-party libraries such as NumPy, SciPy, or proprietary full-featured statistics packages aimed at professional statisticians such as Minitab, SAS and Matlab. It is aimed at the level of graphing and scientific calculators.

SciPy: SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering.

NumPy: NumPy is the fundamental package for scientific computing with Python. It contains among other things:

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

Pandas: Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

Matplotlib: Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.

Seaborn: Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

StatsModels: StatsModels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct.

StatsModels - Time Series analysis tsa: statsmodels.tsa contains model classes and functions that are useful for time series analysis. Basic models include univariate autoregressive models (AR), vector autoregressive models (VAR) and univariate autoregressive moving average models (ARMA). Non-linear models include Markov switching dynamic regression and autoregression. It also includes descriptive statistics for time series, for example autocorrelation, partial autocorrelation function and periodogram, as well as the corresponding theoretical properties of ARMA or related processes. It also includes methods to work with autoregressive and moving average lag-polynomials. Additionally, related statistical tests and some useful helper functions are available.

Scikit-learn: Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Keras: Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft CNTK or Theano. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.

Excel

  1. Sort: You can sort your Excel data on one column or multiple columns. You can sort in ascending or descending order.
  2. Filter: Filter your Excel data if you only want to display records that meet certain criteria.
  3. Conditional Formatting: Conditional formatting in Excel enables you to highlight cells with a certain color, depending on the cell’s value.
  4. Charts: A simple Excel chart can say more than a sheet full of numbers. As you’ll see, creating charts is very easy.
  5. Pivot Tables: Pivot tables are one of Excel’s most powerful features. A pivot table allows you to extract the significance from a large, detailed data set.
  6. Tables: Tables allow you to analyze your data in Excel quickly and easily.
  7. What-If Analysis: What-If Analysis in Excel allows you to try out different values (scenarios) for formulas.
  8. Solver: Excel includes a tool called solver that uses techniques from the operations research to find optimal solutions for all kind of decision problems.
  9. Analysis ToolPak: The Analysis ToolPak is an Excel add-in program that provides data analysis tools for financial, statistical and engineering data analysis.