
每日一练 | Data Scientist & Business Analyst 面试题 141

2017-07-26 大数据应用 大数据应用


Day 41

DS Interview Questions

How to define the number of clusters?

BA Interview Questions

R language:

Write a nested for-loop which prints 30 numbers (1:10, 2:11, 3:12).

Those are three clusters of ten numbers each.

The first loop determines the number of clusters (3) via its length.

Each cluster starts one number higher than the previous one.

The outcome is expected as shown below:

[1]  1  2  3  4  5  6  7   8   9  10

[1]  2  3  4  5  6  7  8   9  10 11

[1]  3  4  5  6  7  8  9 10  11 12


Day 40 答案揭晓

DS Interview Questions

How to choose the parameter in gradient descent?

In the gradient descent algorithm:

Wj is the vector of weights we need to estimate, F is the cost function, and is the learning rate, the parameter we need to tune for the algorithm. This parameter determines how fast we move toward the local optimum. If λ is too large, we will miss the optimal point; if λ is too small, the convergence process will be inefficient.  

  • Adapt the value of learning rate for different data sizes

If the cost function we choose is significantly affected by dataset size, we need to adjust the learning rate with respect to data size:

N is the size of our data.

  • Adapt the learning rate in each iteration

We could check the value of the cost function using the estimated weights vector in each iteration: if the error has decreased since the last iteration, try increasing the learning rate by 5%; if the error has increased (meaning we have skipped the optimal point), reset the weights vector to the values of the previous iteration and decrease the learning rate by 50%.

Reference: http://blog.datumbox.com/tuning-the-learning-rate-in-gradient-descent/

BA Interview Questions

R language: add up every elements in the matrix:

a <- matrix(c(0,2,0,2,1,2,2,0,2)

# This blue-coated text is the modified question and the answers

# QUESTION:Building a modified matrix 3x3 and to calculate its sum

a <- matrix(data = c(0,2,0,2,1,2,2,0,2), nrow = 3, ncol = 3)

# ANSWER Version #1: Adding by using nested for-loops

add <- function(a) {

    x <- 0

    for(i in 1:nrow(a)) {

       for(j in 1:ncol(a)) {

         x <- x + a[i,j]





# ANSWER Version #2: Adding by R function (Optimal)




数据应用学院(Data Application Lab), 北美第一家培训-项目实习-职业辅导-内推一站式专业数据人才输送机构,提供大数据和数据科学培训和公司项目解决方案,由南加州与硅谷的高级数据科学家与数据工程师联合创办,致力于传播数据行业最新应用和知识、培训及输送优秀大数据人才,以填补人才缺口、充分发挥大数据在商业中的力量。2016年被北美著名科技杂志Tech Beacon评为Top Data Camp。 



  1. 职责:

    1. 深度讨论数据应用

    2. 调研行业发展

  2. 要求:

    1. 对数据应用极为感兴趣

    2.  具备数据分析基础


    4. 写作能力强

感兴趣的同学发送简历writing samplehr@dataapplab.com,邮件标题“申请翻译/Technical Writer”。



