每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 304 | 自由微信

查看原文

其他

每日一练 | Data Scientist & Business Analyst & Leetcode 面试题 304

Original 2018-03-02 数据应用学院 大数据应用

自2017年6月15日起，数据应用学院与你一起温习数据科学（DS）和商业分析（BA）领域常见的面试问题。从2017年10月4号起，每天再为大家分享一道Leetcode算法题。

希望积极寻求相关领域工作的你每天关注我们的问题并且与我们一起思考，我们将会在第二天给出答案。

Day 204

DS Interview Questions

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

BA Interview Questions

R language:

Using the following variable:

i=10

x=10

# type a while() or repeat () loop that decreasing i computes x=x/i until i=0.

LeetCode Questions

Description:

Given a 2D board and a word, find if the word exists in the grid.
The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once.

Input: board = [ ['A','B','C','E'], ['S','F','C','S'], ['A','D','E','E'] ] word = "ABCCED"
Output: true

欲知答案如何？请见下期分解！

Day 203 答案揭晓

DS Interview Questions

You are given a train data set having 1000 columns and 1 million rows. The data set is based on a classification problem. Your manager has asked you to reduce the dimension of this data so that model computation time can be reduced. Your machine has memory constraints. What would you do?

Processing a high dimensional data on a limited memory machine is a strenuous task, your interviewer would be fully aware of that. Following are the methods you can use to tackle such situation:

Since we have lower RAM, we should close all other applications in our machine, including the web browser, so that most of the memory can be put to use.
We can randomly sample the data set. This means, we can create a smaller data set, let’s say, having 1000 variables and 300000 rows and do the computations.
To reduce dimensionality, we can separate the numerical and categorical variables and remove the correlated variables. For numerical variables, we’ll use correlation. For categorical variables, we’ll use chi-square test.
Also, we can use PCA and pick the components which can explain the maximum variance in the data set.
Using online learning algorithms like Vowpal Wabbit (available in Python) is a possible option.
Building a linear model using Stochastic Gradient Descent is also helpful.
We can also apply our business understanding to estimate which all predictors can impact the response variable. But, this is an intuitive approach, failing to identify useful predictors might result in significant loss of information.

BA Interview Questions

R language:

Using the following variable:

a=1:10

# type a while () loop that computes a vector x=1 3 6 10 15 21 28 36 45 55

# such that:

# x[1]=a[1]

# x[2]=a[1]+a[2]

# x[3]=a[1]+a[2]+a[3]

# .

a=1:10

i=0

x<-NULL

while(i<11){

if(i==1){x[i]=a[i]}

else {x[i]=a[i]+x[i-1]}

i=i+1

}

Leetcode Questions

Description:

Given a set of distinct integers, nums, return all possible subsets.

Input: [1,2,3]
Output: [[],[1],[1,2],[1,2,3],[1,3],[2],[2,3],[3]]
Assumptions:

The solution set must not contain duplicate subsets.

子集问题就是隐式图的深度优先搜索遍历。因为是distinct，所以不需要去重。

Time Complexity: O(2 ^ n)

Space Complexity: O(n)

点击“阅读原文”查看数据应用学院核心课程

反向激励，在加速这个社会的黑化

把抄袭说的如此冠冕堂皇，雷军让年轻人丢掉了耻辱感

2024【公共营养师】报名通道已开启，不限学历，23岁及以上可报!还能领2000补贴

Wealth | 中国成本轮金价涨势的前沿和中心

父亲出轨后，母亲对父亲实施了她的精确打击 | 二湘空间