GBase有奖征文优选文章9 | GBase 8a 集群实战演练
GBase 8a 集群实战演练
文 | 崔凤仪
活动背景
本人参加GBase 8a MPP集群实战演练活动,具体实战演练内容详见下文。
安 装
演练环境
数据库版本:GBase 8a MPP Cluster V95
服务器:10.206.16.106(主安装节点)、10.206.16.108
SSH工具:FinalShell
客户端工具:GBase Data Studio
安装步骤
环境准备
106、108执行如下命令:
$ #root 用户登录
$ #创建gbase用户(密码:gbase)
$ useradd gbase
$ passwd gbase
$ cd /opt/
$ mkdir -p /opt/gbase
$ #授权
$ chown gbase:gbase /opt/gbase/
集群安装
解压软件包
106:
$ tar xjf GBase8a_MPP_Cluster-License-9.5.2.39-redhat7.3-x86_64.tar.bz2
$ cd gcinstall/
$ cp SetSysEnv.py /opt/
执行环境配置脚本
108:
$ #复制环境配置脚本
$ scp root@10.206.16.106:/opt/gcinstall/SetSysEnv.py /opt/
106、108:
$ #运行SetSysEnv.py配置安装环境
$ python SetSysEnv.py --dbaUser=gbase --installPrefix=/opt/gbase --cgroup
数据库安装
106:
$ #DBA用户在主安装节点执行
$ su - gbase
$ cd /opt/gcinstall/
$ vi demp.options
修改demo.options如下(两个节点对称部署):
installPrefix= /opt/gbase
coordinateHost = 10.206.16.106,10.206.16.108
coordinateHostNodeID = 106,108
dataHost = 10.206.16.106,10.206.16.108
#existCoordinateHost =
#existDataHost =
dbaUser = gbase
dbaGroup = gbase
dbaPwd = ‘gbase’
rootPwd = ‘Gbase_0312!@’
#rootPwdFile = rootPwd.json
$ ./gcinstall.py --silent=demo.options
# 安装完成后显示如下日志
10.206.16.108 install cluster on host 10.206.16.108 successfully.
10.206.16.106 install cluster on host 10.206.16.106 successfully.
Starting all gcluster nodes...
start service failed on host 10.206.16.108.
start service failed on host 10.206.16.106.
adding new datanodes to gcware...
InstallCluster Successfully.
$ exit
$ su - gbase
$ gcadmin
CLUSTER STATE: ACTIVE
================================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
================================================================
| NodeName | IpAddress | gcware | gcluster | DataState |
----------------------------------------------------------------
| coordinator1 | 10.206.16.106 | OPEN | CLOSE | 0 |
----------------------------------------------------------------
| coordinator2 | 10.206.16.108 | OPEN | CLOSE | 0 |
----------------------------------------------------------------
==============================================================
| GBASE CLUSTER FREE DATA NODE INFORMATION |
==============================================================
| NodeName | IpAddress | gnode | syncserver | DataState |
--------------------------------------------------------------
| FreeNode1 | 10.206.16.106 | CLOSE | OPEN | 0 |
--------------------------------------------------------------
| FreeNode2 | 10.206.16.108 | CLOSE | OPEN | 0 |
--------------------------------------------------------------
0 virtual cluster
2 coordinator node
2 free data node
导入License
106:
$ #查看license状态
$ ./chkLicense -n 10.206.16.106,10.206.16.108 -u gbase -p gbase
======================================================================
10.206.16.108
is_exist:no
======================================================================
10.206.16.106
is_exist:no
$ #导入license文件
$ ./License -n 10.206.16.106,10.206.16.108 -f /opt/20220315-02.lic -u gbase -p gbase
======================================================================
Successful node nums: 2
======================================================================
$ #再次查看license状态,已经配置
$ ./chkLicense -n 10.206.16.106,10.206.16.108 -u gbase -p gbase
======================================================================
10.206.16.108
is_exist:yes
version:trial
expire_time:20220615
is_valid:yes
======================================================================
10.206.16.106
is_exist:yes
version:trial
expire_time:20220615
is_valid:yes
106,108:
$ #重启集群服务
$ gcluster_services all stop
$ gcluster_services all start
$ gcadmin
CLUSTER STATE: ACTIVE
================================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
================================================================
| NodeName | IpAddress | gcware | gcluster | DataState |
----------------------------------------------------------------
| coordinator1 | 10.206.16.106 | OPEN | OPEN | 0 |
----------------------------------------------------------------
| coordinator2 | 10.206.16.108 | OPEN | OPEN | 0 |
----------------------------------------------------------------
==============================================================
| GBASE CLUSTER FREE DATA NODE INFORMATION |
==============================================================
| NodeName | IpAddress | gnode | syncserver | DataState |
--------------------------------------------------------------
| FreeNode1 | 10.206.16.106 | OPEN | OPEN | 0 |
--------------------------------------------------------------
| FreeNode2 | 10.206.16.108 | OPEN | OPEN | 0 |
--------------------------------------------------------------
0 virtual cluster
2 coordinator node
2 free data node
生成 distribution
#一个分片,一个备份,负载均衡模式
$ gcadmin distribution gcChangeInfo.xml p 1 d 1 pattern 1
gcadmin generate distribution ...
NOTE: node [10.206.16.106] is coordinator node, it shall be data node too
NOTE: node [10.206.16.108] is coordinator node, it shall be data node too
gcadmin generate distribution successful
$ gcadmin showdistribution node
Distribution ID: 1 | State: new | Total segment num: 2
============================================================================================
| nodes | 10.206.16.106 | 10.206.16.108 |
--------------------------------------------------------------------------------------------
| primary | 1 | 2 |
| segments | | |
--------------------------------------------------------------------------------------------
|duplicate | 2 | 1 |
|segments 1| | |
============================================================================================
数据库初始化
$ gccli -u root -p
Enter password:
GBase client 9.5.2.39.126761. Copyright (c) 2004-2022, GBase. All Rights Reserved.
gbase> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| performance_schema |
| gbase |
| gctmpdb |
+--------------------+
4 rows in set (Elapsed: 00:00:00.01)
gbase> initnodedatamap;
Query OK, 0 rows affected (Elapsed: 00:00:00.31)
数据准备
建库
gbase> create database GBase8a_cuifengyi;
Query OK, 1 row affected (Elapsed: 00:00:00.01)
gbase> use GBase8a_cuifengyi;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)
建表
创建员工、部门、薪水三张表结构
gbase> CREATE TABLE Employee (
-> id INT COMMENT '主键',
-> NAME VARCHAR ( 50 ) COMMENT '员工姓名',
-> departmentId INT COMMENT 'Department表中ID的外键',
-> workingAge TINYINT COMMENT '工龄',
-> PRIMARY KEY ( id )
-> ) REPLICATED;
Query OK, 0 rows affected (Elapsed: 00:00:00.26)
gbase> CREATE TABLE Department (
-> id INT COMMENT '主键',
-> Ename VARCHAR ( 100 ) COMMENT '部门英文名称',
-> Cname VARCHAR ( 50 ) COMMENT '部门中文名称',
-> PRIMARY KEY ( id )
-> ) REPLICATED;
Query OK, 0 rows affected (Elapsed: 00:00:00.05)
gbase> CREATE TABLE Salary (
-> employeeID INT COMMENT 'Employee.id',
-> salary INT COMMENT '工资',
-> yearmonth VARCHAR ( 6 ) COMMENT '年月(YYYYMM)'
-> ) DISTRIBUTED BY ( 'employeeID' );
Query OK, 0 rows affected (Elapsed: 00:00:00.07)
gbase> show tables;
+-----------------------------+
| Tables_in_gbase8a_cuifengyi |
+-----------------------------+
| department |
| employee |
| salary |
+-----------------------------+
3 rows in set (Elapsed: 00:00:00.00)
文件入库
文件上传
将《南大通用2022数据库大赛造数.xls 》拆分为employee.csv、salary.csv、department.csv三个文件,使用finalshell工具,通过拖拽的方式,将文件上传至服务器。
loadfile
employ.csv 的“name”字段值包含 ‘,’(英文逗号),与csv默认分隔符冲突,故先在csv文件中做处理,将’,’(英文逗号)替换为"@",入库后再更新回初始数据。
gbase> LOAD DATA INFILE 'file://10.206.16.106/sampledata/employee.csv' INTO TABLE gbase8a_cuifengyi.employee data_format 3 fields terminated by ',' table_fields 'id,name,departmentId,workingAge' trace 1 trace_path '/home/gbase/' ignore 1 lines;
Query OK, 740 rows affected (Elapsed: 00:00:00.06)
Task 29 finished, Loaded 740 records, Skipped 0 records
gbase>
gbase>
gbase> LOAD DATA INFILE 'file://10.206.16.106/sampledata/department.csv' INTO TABLE gbase8a_cuifengyi.department data_format 3 fields terminated by ',' table_fields 'id,Ename,Cname' trace 1 trace_path '/home/gbase/' ignore 1 lines;
Query OK, 39 rows affected (Elapsed: 00:00:00.07)
Task 30 finished, Loaded 39 records, Skipped 0 records
gbase>
gbase> LOAD DATA INFILE 'file://10.206.16.106/sampledata/salary.csv' INTO TABLE gbase8a_cuifengyi.salary data_format 3 fields terminated by ',' table_fields 'employeeID,salary,yearmonth' trace 1 trace_path '/home/gbase/' ignore 1 lines;
Query OK, 4440 rows affected (Elapsed: 00:00:00.11)
Task 31 finished, Loaded 4440 records, Skipped 0 records
#更新回初始数据
gbase> update gbase8a_cuifengyi.employee set name = replace(name,'@',',');
Query OK, 740 rows affected (Elapsed: 00:00:00.04)
Rows matched: 740 Changed: 740 Warnings: 0
SQL编码题
编码题在GBase Data Studio完成。
T1
将Department.Ename分解为两列,一列是部门简称EnameAbbr,一列是部门全称EnameDetail,分隔符是冒号。分隔后的名称不能有空格。请将结果存入新表department_new中。
create table department_new as
select replace(if(instr(ename,':',1)=0,ename,substr(ename,0,instr(ename,':',1)-1)),' ','') as EnameAbbr,
replace(substr(ename,instr(ename,':',1)+1,length(ename)-instr(ename,':',1)),' ','') as EnameDetail
from department;
select * from department_new;
T2
列出各部门员工工龄段。分别统计各部门员工工龄(1~5年)、工龄(6~10年)、工龄(大于10年)的数量。
-- 1.列出各部门员工工龄段
select t2.cname,
t1.name,
t1.workingAge,
case when t1.workingAge < 6 then '1~5年'
when t1.workingAge < 11 then '6~10年'
else '10年以上'
end as 工龄段
from employee t1
left join department t2
on t1.departmentId = t2.id
order by t2.cname,t1.workingAge;
-- 2.分别统计各部门员工工龄
select t2.cname,
sum(case when t1.workingAge > 0 and t1.workingAge < 6 then 1 else 0 end) as "工龄1~5年",
sum(case when t1.workingAge > 5 and t1.workingAge < 11 then 1 else 0 end) as "工龄6~10年",
sum(case when t1.workingAge > 10 then 1 else 0 end) as "工龄大于10年"
from employee t1
left join department t2
on t1.departmentId = t2.id
group by t2.cname
order by t2.cname;
T3
列出2021年06月每个部门收入排名前三的员工。结果集为部门名、员工姓名、工资。注意:某部门工资前三名员工数量可能大于或小于3;特例:没有第三高的工资。
select cname,
name,
salary
from (
select t3.cname,
t2.name,
t1.salary,
dense_rank() over(partition by cname order by salary desc) as rk
from salary t1
left join employee t2
on t1.employeeID = t2.id
left join department t3
on t2.departmentid = t3.id
where t1.yearmonth = '202106'
)t4
where rk < 4
order by cname,rk;
T4
统计每月各部门内员工收入在本部门的占比。结果集为:年月、部门、员工姓名、收入在本部门的占比。
select t1.yearmonth,
t3.cname,
t2.name,
round(t1.salary/sum(salary) over(partition by cname,yearmonth)*100,2)||'%' as 部门占比
from salary t1
left join employee t2
on t1.employeeID = t2.id
left join department t3
on t2.departmentid = t3.id;
T5
统计每个部门总收入每月的环比增长率( (本月-上月)*100%/上月 )。注意202101没有上月的比对数据,所以环比为空。
select cname,
yearmonth,
cur_salary as 本月工资总额,
round((cur_salary-bef_salary)*100/bef_salary,2)||'%' as '环比增长率'
from (
select t4.*,
lag(cur_salary) over(partition by cname order by yearmonth) as bef_salary
from(
select t3.cname,
t1.yearmonth,
sum(salary) cur_salary
from salary t1
left join employee t2
on t1.employeeID = t2.id
left join department t3
on t2.departmentid = t3.id
group by cname,yearmonth
)t4
)t5;
活动感受
很荣幸有这次免费的学习机会,先学习了GDCA课程,后参加本次GBase8a集群实战演练活动,课程中知识点讲解非常细致,再结合实践,对GBase8aMPP整体有了更加深入的认识。感谢GBase给予这样的学习平台,以及老师们的耐心指导。收获颇丰。
THE END
往期文章
新闻资讯
南大通用GBase数据库在国家石油天然气管网集团有限公司的应用
生态合作
南大通用GBase 8c数据库与泛微软件完成互认证 共同搭建统一数字化办公平台
培训活动
培训|“老”程序员的“新”学堂 —— 优秀学员谈GBase初体验
认证培训 | 欢迎参加GBase 8a MPP CLuster数据库3月训练营
技术干货