帮我得到第一份数据分析师工作的9个SQL核心概念
今日份知识你摄入了么?
来自Shutterstock
如果我必须重新开始我的数据科学之旅,我会先学习SQL。
我不算SQL的拥趸,但你确实很需要SQL去通过数据科学/分析师面试,无论你的职位是什么,你都会经常使用SQL创建查询并与公司的数据库进行交互。
下面是帮助我通过第一次数据分析师面试的9个SQL核心概念。
// 数据
”
为了展示这些概念,我们将使用一个包含100多名顶级足球运动员信息的表。你可以在这里(https://drive.google.com/file/d/1ZpBC71J-SFbA0tgJjWiNy58QA57YT1I8/view?usp=sharing)下载此数据并将其导入你的数据库。
下面是我们将使用的表格,我将其命名为players_description。
Initial table: players_description
+----+-------------------+------------+-----+-----+---------------+
| ID | FirstName | LastName | Age | ... | Team_Position |
+----+-------------------+------------+-----+-----+---------------+
| 1 | Lionel Andrés | Messi | 32 | ... | RW |
| 2 | Cristiano Ronaldo | dos Santos | 34 | ... | LW |
| 3 | Neymar | da Silva | 27 | ... | CAM |
| 4 | Jan | Oblak | 26 | ... | GK |
| 5 | Eden | Hazard | 28 | ... | LW |
| 6 | Kevin | De Bruyne | 28 | ... | CAM |
| 7 | Marc-André | ter Stegen | 27 | ... | GK |
| 8 | Virgil | van Dijk | 27 | ... | LF |
| 9 | Luka | Modrić | 33 | ... | CM |
| 10 | Mohamed | Salah | 27 | ... | RW |
| .. | ... | ... | ... | ... | ... |
+----+-------------------+------------+-----+-----+---------------+
// 1、SELECT语句
”
SELECT语句允许我们从数据库中选择数据,是SQL中最常见的语句,因此请记住下面的语法。
让我们从players_description表中选择列FirstName、LastName和Height。
Query:SELECT FirstName, LastName, Height
FROM players_description;Output:+---------------------+--------------+--------+
| FirstName | LastName | Height |
+---------------------+--------------+--------+
| "Lionel Andrés" | Messi | 170 |
| "Cristiano Ronaldo" | "dos Santos" | 187 |
| Neymar | "da Silva" | 175 |
| Jan | Oblak | 188 |
| Eden | Hazard | 175 |
| Kevin | "De Bruyne" | 181 |
| Marc-André | "ter Stegen" | 187 |
| Virgil | "van Dijk" | 193 |
| Luka | Modrić | 172 |
| Mohamed | Salah | 175 |
+---------------------+--------------+--------+
// 2、SELECT DISTINCT语句
”
每当要从列中选择唯一元素时,我们都会使用SELECT DISTINCT语句。
让我们看看players_description表中唯一的Team_Position。
Query:SELECT DISTINCT Team_Position
FROM players_description;Output:+---------------+
| Team_Position |
+---------------+
| RW |
| LW |
| CAM |
| GK |
| LF |
| CM |
| ST |
| CDM |
| LM |
| RM |
| RB |
| CF |
| CB |
| RAM |
| LAM |
+---------------+
// 3、WHERE子句
”
WHERE子句允许我们向“查询”添加条件,只有那些满足条件的值才会在输出表中显示。
让我们创建一个查询,过滤掉身高Height超过180厘米的球员。
Query:SELECT FirstName, LastName, Height
FROM players_description
WHERE Height>180;Output:+---------------------+--------------+--------+
| FirstName | LastName | Height |
+---------------------+--------------+--------+
| "Cristiano Ronaldo" | "dos Santos" | 187 |
| Jan | Oblak | 188 |
| Kevin | "De Bruyne" | 181 |
| Marc-André | "ter Stegen" | 187 |
| Virgil | "van Dijk" | 193 |
| Kalidou | Koulibaly | 187 |
| Harry | Kane | 188 |
| Alisson | Becker | 191 |
| David | "De Gea" | 192 |
| Giorgio | Chiellini | 187 |
+---------------------+--------------+--------+
// 4、COUNT()、AVG()和SUM()函数
”
SQL中最常用的函数是:count“COUNT()”、average“AVG()”和sum“SUM()”。
用上我们刚学过的WHERE子句,下面的就是COUNT()、AVG()和SUM()的语法。
让我们看看我们可以从FC Barcelona俱乐部获得的见解。
Query 1: Let's see how many Barcelona players are in our tableSELECT COUNT(Playerid)
FROM players_description
WHERE Club="FC Barcelona";Output 1:+-----------------+
| COUNT(Playerid) |
+-----------------+
| 15 |
+-----------------+Query 2: Calculate the average height of Barcelona playersSELECT AVG(Height)
FROM players_description
WHERE Club="FC Barcelona";Output 2:+-------------+
| AVG(Height) |
+-------------+
| 181.2667 |
+-------------+Query 3: Sum the wage of Barcelona playersSELECT SUM(Wage_eur)
FROM players_description
WHERE Club="FC Barcelona";Output 3:+---------------+
| SUM(Wage_eur) |
+---------------+
| 3945 |
+---------------+
// 5、ORDER BY
”
每当我们想要按升序或降序对输出表进行排序时,我们都会使用ORDER BY关键字。
让我们看看谁是我们表中最高的运动员。
Query:SELECT FirstName, LastName, Height
FROM players_description
ORDER BY Height DESC;Output:+-----------+-------------+--------+
| FirstName | LastName | Height |
+-----------+-------------+--------+
| Thibaut | Courtois | 199 |
| Gianluigi | Donnarumma | 196 |
| Jiří | Pavlenka | 196 |
| Wojciech | Szczęsny | 195 |
| Niklas | Süle | 195 |
| Zlatan | Ibrahimović | 195 |
| Gerard | Piqué | 194 |
| Virgil | "van Dijk" | 193 |
| Samir | Handanovič | 193 |
| Manuel | Neuer | 193 |
+-----------+-------------+--------+
// 6、AND/OR运算符
”
我们可以将WHERE子句与不同的运算符(如AND/OR)组合在一起。
下面是要使用的语法。
它们之间的区别是:
如果所有条件都为真,则是AND
如果任何一个条件为真,则是OR
让我们看一些AND/OR的示例。
Query: Show players from France (FRA) with height greater than 180cm.SELECT FirstName, LastName, Height, Nationality
FROM players_description
WHERE Height>180 AND Nationality="FRA";Output:+-----------+-----------+--------+-------------+
| FirstName | LastName | Height | Nationality |
+-----------+-----------+--------+-------------+
| Paul | Pogba | 191 | FRA |
| Hugo | Lloris | 188 | FRA |
| Aymeric | Laporte | 189 | FRA |
| Karim | Benzema | 185 | FRA |
| Samuel | Umtiti | 182 | FRA |
| Raphaël | Varane | 191 | FRA |
| Clément | Lenglet | 186 | FRA |
| Lucas | Hernández | 182 | FRA |
| Stéphane | Ruffier | 188 | FRA |
+-----------+-----------+--------+-------------+Query: Show players that are from France (FRA) or that have a height greater than 180cm.SELECT FirstName, LastName, Height, Nationality
FROM players_description
WHERE Height>180 OR Nationality="FRA";Output:+---------------------+--------------+--------+-------------+
| FirstName | LastName | Height | Nationality |
+---------------------+--------------+--------+-------------+
| "Cristiano Ronaldo" | "dos Santos" | 187 | PRT |
| Jan | Oblak | 188 | SGP |
| Kevin | "De Bruyne" | 181 | BLX |
| Marc-André | "ter Stegen" | 187 | DEU |
| Virgil | "van Dijk" | 193 | NPL |
| Kylian | Mbappé | 178 | FRA |
| Kalidou | Koulibaly | 187 | SEN |
| Harry | Kane | 188 | UK |
| Alisson | Becker | 191 | BRA |
| David | "De Gea" | 192 | ESP |
+---------------------+--------------+--------+-------------+
// 7、BETWEEN运算符
”
BETWEEN运算符可以选择给定范围内的值(如数字、文本或日期)。此运算符是包含性的,从开始值到结束值都将包含在内。
让我们来看看哪些运动员的年龄在20到30岁之间。
Query:SELECT FirstName, LastName, Age
FROM players_description
WHERE Age BETWEEN 20 AND 30;Output:+------------+--------------+-----+
| FirstName | LastName | Age |
+------------+--------------+-----+
| Neymar | "da Silva" | 27 |
| Jan | Oblak | 26 |
| Eden | Hazard | 28 |
| Kevin | "De Bruyne" | 28 |
| Marc-André | "ter Stegen" | 27 |
| Virgil | "van Dijk" | 27 |
| Mohamed | Salah | 27 |
| Kylian | Mbappé | 20 |
| Kalidou | Koulibaly | 28 |
| Harry | Kane | 25 |
+------------+--------------+-----+
// 8、IN/NOT IN运算符
”
当我们想在WHERE子句中指定多个值时,我们使用IN运算符。
让我们列出一份效力于FC Barcelona俱乐部或Real Madrid俱乐部的运动员名单。
Query:SELECT FirstName, LastName, Age, Club
FROM players_description
WHERE Club IN ('FC Barcelona', 'Real Madrid');Output:+-----------------+--------------+-----+----------------+
| FirstName | LastName | Age | Club |
+-----------------+--------------+-----+----------------+
| "Lionel Andrés" | Messi | 32 | "FC Barcelona" |
| Eden | Hazard | 28 | "Real Madrid" |
| Marc-André | "ter Stegen" | 27 | "FC Barcelona" |
| Luka | Modrić | 33 | "Real Madrid" |
| Sergio | Ramos | 33 | "Real Madrid" |
| Luis | Alberto | 32 | "FC Barcelona" |
| Sergio | Busquets | 30 | "FC Barcelona" |
| Antoine | Griezmann | 28 | "FC Barcelona" |
| Thibaut | Courtois | 27 | "Real Madrid" |
| Gerard | Piqué | 32 | "FC Barcelona" |
+-----------------+--------------+-----+----------------+
// 9、LIKE-NOT LIKE运算符
”
当我们想要在列中搜索特定模式时,我们使用LIKE运算符。
我们可以在模式中使用不同的通配符。下面是两个最常见的通配符:
%:表示零个、一个或多个字符
_:表示单个字符
让我们过滤名字以字母“S”开头的球员。
Query:SELECT FirstName, LastName
FROM players_description
WHERE FirstName LIKE 's%';Output:+-----------+------------------+
| FirstName | LastName |
+-----------+------------------+
| Sergio | Agüero |
| Sergio | Ramos |
| Sergio | Busquets |
| Samir | Handanovič |
| Sadio | Mané |
| Samuel | Umtiti |
| Sergej | Milinković-Savić |
| Saúl | Ñíguez |
| Serge | Gnabry |
| Stefan | "de Vrij" |
| Stéphane | Ruffier |
| Salvatore | Sirigu |
| Sokratis | Papastathopoulos |
| Sergio | Canales |
+-----------+------------------+
利用好%和_,你可以在各种模式里驰骋(详见:https://www.w3schools.com/sql/sql_like.asp)。
原文作者:Frank Andrade
翻译作者:高佑兮
美工编辑:过儿
校对审稿:Chuang
原文链接:https://towardsdatascience.com/9-sql-core-concepts-that-helped-me-get-my-first-data-analyst-job-a582f892276f
本周公开课预告
往期精彩回顾
Consumer Journeys?你需要的是Persuasion Journey
点击“阅读原文”查看数据应用学院核心课程