SQL为什么动不动就N百行以K计
sales_amount | 销售业绩表 |
sales | 销售员姓名,假定无重名 |
product | 销售的产品 |
amount | 该销售员在该产品上的销售额 |
现在我们想知道出空调和电视销售额都在前 10 名的销售员名单。
1. 按空调销售额排序,找出前 10 名;
2. 按电视销售额排序,找出前 10 名;
select * from
( select top 10 sales from sales_amount where product='AC' order by amount desc )
intersect
( select top 10 sales from sales_amount where product='TV' order by amount desc )
with A as
select top 10 sales from sales_amount where product='AC' order by amount desc
B as
select top 10 sales from sales_amount where product='TV' order by amount desc
select * from A intersect B
句子没有更短,但分步后思路确实变清晰了。
1. 列出所有产品;
2.将每种产品的前 10 名取出,分别保存;
1.将数据按产品分组,将每组排序,取出前 10 名;
select sales
from ( select sales,
from ( select sales,
rank() over (partition by product order by amount desc ) ranking
from sales_amount)
where ranking <=10 )
group by sales
having count(*)=(select count(distinct product) from sales_amount)
这是能写出来,但这样复杂的 SQL,有多少人会写呢?
虽然 SQL 有集合概念,但并未把集合作为一种基础数据类型提供,不能让变量或字段的取值是个集合,除了表之外也没有其它集合形式的数据类型,这使得大量集合运算在思维和书写时都需要绕路。
select sales
from ( select A.sales sales, A.product product,
(select count(*)+1 from sales_amount
where A.product=product AND A.amount<=amount) ranking
from sales_amount A )
where product='AC' AND ranking<=10
select sales
from ( select A.sales sales, A.product product, count(*)+1 ranking
from sales_amount A, sales_amount B
where A.sales=B.sales and A.product=B.product AND A.amount<=B.amount
group by A.sales,A.product )
where product='AC' AND ranking<=10
这样的 SQL 语句,专业程序员写出来也未必容易吧!而仅仅是计算了一个前 10 名。
employee | 员工表 |
name | 员工姓名,假定无重名 |
gender | 员工性别 |
我们已经计算出“好”销售员的名单,比较自然的想法,是用名单到花名册时找出其性别,再计一下数。但在 SQL 中要跨表获得信息需要用表间连接,这样,接着最初的结果,SQL 就会写成:
select employee.gender,count(*)
from employee,
( ( select top 10 sales from sales_amount where product='AC' order by amount desc )
intersect
( select top 10 sales from sales_amount where product='TV' order by amount desc ) ) A
where A.sales=employee.name
group by employee.gender
select sales.gender,count(*)
from (…) // …是前面计算“好”销售员的SQL
group by sales.gender
显然,这个句子不仅更清晰,同时计算效率也会更高(没有连接计算)。
mov ax,3
mov bx,5
mul bx,7
add ax,bx
这样的代码无论书写还是阅读都远不如 3+5*7 了(要是碰到小数就更要命了)。虽然对于熟练的程序员也算不了太大的麻烦,但对于大多数人而言,这种写法还是过于晦涩难懂了,从这个意义上讲,FORTRAN 确实是个伟大的发明。
这些问题本身应该也算不上很复杂,都是在日常数据分析中经常会出现的,但已经很难为 SQL 了。
集合无序
select name, birthday
from (select name, birthday, row_number() over (order by birthday) ranking
from employee )
where ranking=(select floor((count(*)+1)/2) from employee)
select max (consecutive_day)
from (select count(*) (consecutive_day
from (select sum(rise_mark) over(order by trade_date) days_no_gain
from (select trade_date,
case when
closing_price>lag(closing_price) over(order by trade_date)
then 0 else 1 END rise_mark
from stock_price) )
group by days_no_gain)
集合化不彻底
select * from employee
where to_char (birthday, ‘MMDD’) in
( select to_char(birthday, 'MMDD') from employee
group by to_char(birthday, 'MMDD')
having count(*)>1 )
select name
from (select name
from (select name,
rank() over(partition by subject order by score DESC) ranking
from score_table)
where ranking<=10)
group by name
having count(*)=(select count(distinct subject) from score_table)
缺乏对象引用
select A.*
from employee A, department B, employee C
where A.department=B.department and B.manager=C.name and
A.gender='male' and C.gender='female'
select * from employee
where gender='male' and department in
(select department from department
where manager in
(select name from employee where gender='female'))
where gender='male' and department.manager.gender='female'
select name, company, first_company
from (select employee.name name, resume.company company,
row_number() over(partition by resume. name
order by resume.start_date) work_seq
from employee, resume where employee.name = resume.name)
where work_seq=1
select name,
(select company from resume
where name=A.name and
start date=(select min(start_date) from resume
where name=A.name)) first_company
from employee A
A | |
1 | =employee.sort(birthday) |
2 | =A1((A1.len()+1)/2) |
对于以有序集合为基础的 SPL 来说,按位置取值是个很简单的任务。
任务 2
A | |
1 | =stock_price.sort(trade_date) |
2 | =0 |
3 | =A1.max(A2=if(close_price>close_price[-1],A2+1,0)) |
SPL 按自然的思路过程编写计算代码即可。
任务 3
A | |
1 | =employee.group(month(birthday),day(birthday)) |
2 | =A1.select(~.len()>1).conj() |
SPL 可以保存分组结果集,继续处理就和常规集合一样。
任务 4
A | |
1 | =score_table.group(subject) |
2 | =A1.(~.rank(score).pselect@a(~<=10)) |
3 | =A1.(~(A2(#)).(name)).isect() |
使用 SPL 只要按思路过程写出计算代码即可。
任务 5
A | |
1 | =employee.select(gender=="male" && department.manager.gender=="female") |
支持对象引用的 SPL 可以简单地将外键指向记录的字段当作自己的属性访问。
任务 6
A | |
1 | =employee.new(name,resume.minp(start_date).company:first_company) |
SPL 支持将子表集合作为主表字段,就如同访问其它字段一样,子表无需重复计算。
…
Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
Statement st = connection.();
CallableStatement st = conn.prepareCall("{call xxxx(?,?)}");
st.setObject(1, 3000);
st.setObject(2, 5000);
ResultSet result=st.execute();
...
重磅!开源SPL交流群成立了
简单好用的SPL开源啦!
为了给感兴趣的小伙伴们提供一个相互交流的平台,
特地开通了交流群(群完全免费,不广告不卖课)
需要进群的朋友,可长按扫描下方二维码
本文感兴趣的朋友,请到阅读原文去收藏 ^_^