当前位置: 首页 > news >正文

Snowflake基础知识

  1. Snowflake架构
  • 知识点:Snowflake架构有三个主要层次,分别是存储数据的数据库存储层、执行查询和数据处理任务的计算层,以及管理元数据、查询解析、优化和用户认证的云服务层;虚拟仓库是执行查询的计算资源集群,可按需伸缩。
  • 翻译:Snowflake Architecture: The Snowflake architecture has three main layers, namely the Database Storage Layer for storing data, the Compute Layer for performing query and data processing tasks, and the Cloud Services Layer for managing metadata, query parsing, optimization, and user authentication. A virtual warehouse is a cluster of compute resources that executes queries and can scale up or down as needed.
  1. 数据加载与卸载
  • 知识点:COPY INTO命令用于从外部文件加载数据到Snowflake表;Snowflake有internal stage(用于临时存储数据文件)和external stage(位于外部云存储服务)。
  • 翻译:Data Loading and Unloading: The COPY INTO command is used to load data from external files into Snowflake tables. Snowflake has an internal stage (for temporarily storing data files) and an external stage (located in external cloud storage services).
  1. 数据共享
  • 知识点:Snowflake的数据共享功能可在账户间安全共享数据,无需物理复制移动,接收方可直接查询;安全数据共享保障传输中数据的访问控制与加密。
  • 翻译:Data Sharing: Snowflake’s Data Sharing feature enables secure sharing of data between accounts without physically copying or moving data. Recipients can query the shared data directly, and a Secure Data Share ensures access control and encryption during transit.
  1. 安全与认证
  • 知识点:Snowflake支持用户名密码、单点登录、外部OAuth、多因素认证等认证方式;基于角色的访问控制用于定义管理用户对数据和资源的访问权限。
  • 翻译:Security and Authentication: Snowflake supports authentication methods such as username and password, Single Sign-On, External OAuth, and Multi-factor Authentication. Role-based access control (RBAC) is used to define and manage user access to data and resources.
  1. 性能优化
  • 知识点:Clustering key用于定义表中数据物理排列提升查询性能;有自动和手动聚类两种方式。
  • 翻译:Performance Optimization: Clustering keys are used to define the physical arrangement of data in a table to improve query performance. There are two methods: automatic clustering and manual clustering.
  1. Snowflake SQL
  • 知识点:UPPER()函数将字符串转为大写;COUNT()函数用于计算表的行数。
  • 翻译:Snowflake SQL: The UPPER() function converts a string to uppercase, and the COUNT() function is used to count the number of rows in a table.
  1. Time Travel和Fail-safe
  • 知识点:Time Travel可查询、克隆或恢复先前状态的数据,保留期最长90天;Fail-safe是最后手段的数据恢复机制,有7天恢复期但用户不可查询。
  • 翻译:Time Travel and Fail-safe: Time Travel allows you to query, clone, or restore data from a previous state, with a maximum retention period of 90 days. Fail-safe is a last-resort data recovery mechanism with a 7-day recovery period, but the data is not user-accessible for querying.
  1. Snowflake Marketplace
  • 知识点:Snowflake Marketplace是用户查找访问第三方数据、应用、服务的平台,数据可购买、共享并在Snowflake环境内查询。
  • 翻译:Snowflake Marketplace: The Snowflake Marketplace is a platform where users can find and access third-party data, applications, and services. Data can be purchased or shared and immediately queried within the Snowflake environment.
  1. Snowflake数据类型
  • 知识点:Snowflake支持VARIANT、OBJECT、ARRAY等半结构化数据类型。
  • 翻译:Snowflake Data Types: Snowflake supports semi-structured data types such as VARIANT, OBJECT, and ARRAY.
  1. Snowflake克隆
  • 知识点:克隆可创建数据库、模式、表的副本,不复制底层数据,克隆对象可独立修改。
  • 翻译:Snowflake Cloning: Cloning allows you to create copies of databases, schemas, or tables without duplicating the underlying data. Cloned objects can be modified independently.
  1. 虚拟仓库相关补充
  • 知识点:虚拟仓库暂停时,正在运行的查询自动暂停且不能提交新查询;缩放虚拟仓库时,正在运行的查询不受干扰,只影响新查询。
  • 翻译:Additional Information about Virtual Warehouses: When a virtual warehouse is suspended, the running queries are automatically paused and no new queries can be submitted. When scaling a virtual warehouse, the running queries are not disrupted, and only new queries are affected.
  1. 数据存储与管理补充
  • 知识点:微分区按数据特征将数据组织成小且优化的分区;Snowflake阶段类型有internal stage、external stage ,没有external file stage这一类型。
  • 翻译:Additional Information about Data Storage and Management: Micro-partitioning organizes data into small, optimized partitions based on data characteristics. Snowflake has stage types like internal stage and external stage, but there is no such type as external file stage.
  1. Snowflake SQL与查询补充
  • 知识点:用CREATE TABLE AS SELECT * FROM ;语法从查询结果创建表;DROP TABLE ;用于删除表。
  • 翻译:Additional Information about Snowflake SQL and Querying: Use the syntax CREATE TABLE AS SELECT * FROM to create a table from a query result. DROP TABLE is used to delete a table.
  1. 安全与访问控制补充
  • 知识点:角色授予用户时默认无权限,需显式授予;SELECT权限可在对象级别授予。
  • 翻译:Additional Information about Security and Access Control: By default, a role has no permissions when granted to a user and needs to be explicitly granted. The SELECT privilege can be granted at the object level.
  1. 性能调优补充
  • 知识点:对频繁过滤的列使用clustering key、合适场景用物化视图能提升性能。
  • 翻译:Additional Information about Performance Tuning: Using clustering keys for frequently filtered columns and materialized views in appropriate scenarios can improve performance.
  1. Time Travel与数据恢复补充
  • 知识点:Time Travel最长保留90天;Fail-safe用于防止数据意外删除或损坏丢失。
  • 翻译:Additional Information about Time Travel and Data Recovery: Time Travel can retain data for up to 90 days. Fail-safe is used to prevent data loss in case of accidental deletion or corruption.
  1. 数据共享与市场补充
  • 知识点:数据共享无需复制数据;ACCOUNTADMIN角色可执行创建管理虚拟仓库、查询数据、修改角色权限等操作。
  • 翻译:Additional Information about Data Sharing and Marketplace: Data sharing doesn’t require copying data. The ACCOUNTADMIN role can perform actions such as creating and managing virtual warehouses, querying data, and modifying user roles and access privileges.
  1. Snowflake使用与最佳实践补充
  • 知识点:临时表在会话结束时自动删除;Snowflake通过分配虚拟仓库处理并发查询。
  • 翻译:Additional Information about Snowflake Usage and Best Practices: Temporary tables are automatically dropped at the end of the session. Snowflake handles concurrent queries by allocating virtual warehouses.
  1. 半结构化数据处理补充
  • 知识点:VARIANT用于存储半结构化数据,优势是单列存多种结构数据;ARRAY用于存半结构化数据中的列表。
  • 翻译:Additional Information about Semi-Structured Data Handling: The VARIANT data type is used to store semi-structured data, with the advantage of storing structured, semi-structured, and unstructured data in a single column. The ARRAY data type is used to store lists in semi-structured data.
  1. 数据治理补充
  • 知识点:访问历史用于维护数据沿袭与审计。
  • 翻译:Additional Information about Data Governance: Access History is used to maintain data lineage and auditing.
  1. 数据加载与卸载补充
  • 知识点:从external stage加载数据不支持Excel格式;用COPY INTO命令将数据从表卸载到外部存储。
  • 翻译:Additional Information about Data Loading and Unloading: Loading data from an external stage does not support the Excel format. Use the COPY INTO command to unload data from a table to an external storage.
  1. Snowflake角色与访问控制补充
  • 知识点:创建新用户需SECURITYADMIN权限;CREATE TABLE权限在模式级别授予而非数据库级别。
  • 翻译:Additional Information about Snowflake Roles and Access Control: The SECURITYADMIN privilege is required to create a new user. The CREATE TABLE privilege is granted at the schema level, not the database level.
  1. Snowflake性能优化补充
  • 知识点:Clustering key有诸多性能好处且不会增加存储成本;自动聚类对基于范围过滤的大数据集查询最有益。
  • 翻译:Additional Information about Snowflake Performance Optimization: Clustering keys have many performance benefits and do not increase storage costs. Automatic clustering is most beneficial for queries with range-based filtering on large datasets.
  1. 数据共享补充
  • 知识点:数据共享中消费者接收数据副本用于查询不能修改;可共享数据库 。
  • 翻译:Additional Information about Data Sharing: In data sharing, the consumer receives a copy of the data for querying and cannot modify it. Databases can be shared.
  1. Snowflake监控与管理
  • 知识点:Query Profile、Query History、Information Schema Views可用于监控查询性能与优化。
  • 翻译:Snowflake Monitoring and Management: Query Profile, Query History, and Information Schema Views can be used to monitor query performance and optimize it.

http://www.mrgr.cn/news/81909.html

相关文章:

  • [.闲于修.]Autosar_UDS_笔记篇_ISO14229-1
  • 118.【C语言】数据结构之排序(堆排序和冒泡排序)
  • 我在广州学 Mysql 系列——插入、更新与删除数据详解以及实例
  • #Vue3篇: 无感刷新token的原理JSESSIONID无感刷新和JWT接口刷新
  • 0101java面经
  • 【机器学习篇】穿越数字迷雾:机器深度学习的智慧领航
  • WPF 绘制过顶点的圆滑曲线 (样条,贝塞尔)
  • Qt之QtConcurrent
  • 【服务器项目部署】⭐️将本地项目部署到服务器!
  • 数仓建模:如何进行实体建模?
  • 大模型在自动驾驶领域的应用和存在的问题
  • MySQL数据库的备份与恢复你会了吗?
  • ubuntu2204 gpu 没接显示器,如何连接vnc
  • 3.2、SDH帧结构
  • Rust : tokio中select!
  • 【机器学习】【朴素贝叶斯分类器】从理论到实践:朴素贝叶斯分类器在垃圾短信过滤中的应用
  • Elasticsearch名词解释
  • C++ 设计模式:中介者模式(Mediator Pattern)
  • gesp(二级)(16)洛谷:B4037:[GESP202409 二级] 小杨的 N 字矩阵
  • 自定义 Element Plus 树状表格图标
  • ArcGIS Pro地形图四至角图经纬度标注与格网标注
  • html+css+js网页设计 美食 家美食1个页面
  • 【Rust自学】8.3. String类型 Pt.1:字符串的创建、更新与拼接
  • 被裁20240927 --- 嵌入式硬件开发 STM32篇
  • SonarQube相关的maven配置及使用
  • 【Rust自学】8.2. Vector + Enum的应用