其他

每日一练 | Data Scientist & Business Analyst 面试题 141

2017-07-26 大数据应用 大数据应用

从6月15日起,数据应用学院将与你一起温习数据科学(DS)和商业分析(BA)领域常见的面试问题。希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考,我们将会在第二天给出答案。

Day 41

DS Interview Questions

How to define the number of clusters?

BA Interview Questions

R language:

Write a nested for-loop which prints 30 numbers (1:10, 2:11, 3:12).

Those are three clusters of ten numbers each.

The first loop determines the number of clusters (3) via its length.

Each cluster starts one number higher than the previous one.


The outcome is expected as shown below:

[1]  1  2  3  4  5  6  7   8   9  10

[1]  2  3  4  5  6  7  8   9  10 11

[1]  3  4  5  6  7  8  9 10  11 12

欲知答案如何?请见下期分解!

Day 40 答案揭晓

DS Interview Questions

How to choose the parameter in gradient descent?

In the gradient descent algorithm:

Wj is the vector of weights we need to estimate, F is the cost function, and is the learning rate, the parameter we need to tune for the algorithm. This parameter determines how fast we move toward the local optimum. If λ is too large, we will miss the optimal point; if λ is too small, the convergence process will be inefficient.  


  • Adapt the value of learning rate for different data sizes


If the cost function we choose is significantly affected by dataset size, we need to adjust the learning rate with respect to data size:

N is the size of our data.


  • Adapt the learning rate in each iteration


We could check the value of the cost function using the estimated weights vector in each iteration: if the error has decreased since the last iteration, try increasing the learning rate by 5%; if the error has increased (meaning we have skipped the optimal point), reset the weights vector to the values of the previous iteration and decrease the learning rate by 50%.


Reference: http://blog.datumbox.com/tuning-the-learning-rate-in-gradient-descent/

BA Interview Questions

R language: add up every elements in the matrix:

a <- matrix(c(0,2,0,2,1,2,2,0,2)

# This blue-coated text is the modified question and the answers

# QUESTION:Building a modified matrix 3x3 and to calculate its sum


a <- matrix(data = c(0,2,0,2,1,2,2,0,2), nrow = 3, ncol = 3)


# ANSWER Version #1: Adding by using nested for-loops


add <- function(a) {

    x <- 0

    for(i in 1:nrow(a)) {

       for(j in 1:ncol(a)) {

         x <- x + a[i,j]

       }

     }

     return(x)

}

# ANSWER Version #2: Adding by R function (Optimal)


sum(a)



12

数据应用学院

数据应用学院(Data Application Lab), 北美第一家培训-项目实习-职业辅导-内推一站式专业数据人才输送机构,提供大数据和数据科学培训和公司项目解决方案,由南加州与硅谷的高级数据科学家与数据工程师联合创办,致力于传播数据行业最新应用和知识、培训及输送优秀大数据人才,以填补人才缺口、充分发挥大数据在商业中的力量。2016年被北美著名科技杂志Tech Beacon评为Top Data Camp。 



 长期招募 

TECHNICAL WRITER/翻译志愿者  

  1. 职责:

    1. 深度讨论数据应用

    2. 调研行业发展

  2. 要求:

    1. 对数据应用极为感兴趣

    2.  具备数据分析基础

    3. 具有一定BUSINESS INSIGHT

    4. 写作能力强

感兴趣的同学发送简历writing samplehr@dataapplab.com,邮件标题“申请翻译/Technical Writer”。

往期文章内容


点击“阅读原文”查看数据应用学院核心课程


您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存