JSON 系列之3:导入JSON标准示例数据
关于JSON标准示例数据
JSON标准示例数据 PurchaseOrders.dmp 包含在Oracle 数据库Sample Schema中,如果你安装了github db-sample-schemas项目,那就包含在其中了。当然,你也可以单独下载这个文件。
文件总共有1000行,也就是1000个JSON文档,文档间是用换行分隔的:
$ wc -l *dmp
9999 PurchaseOrders.dmp$ grep PONumber *dmp|wc -l
10000
看下第一条JSON记录:
$ jq -n 'input' < PurchaseOrders.dmp
{"PONumber": 1,"Reference": "MSULLIVA-20141102","Requestor": "Martha Sullivan","User": "MSULLIVA","CostCenter": "A50","ShippingInstructions": {"name": "Martha Sullivan","Address": {"street": "200 Sporting Green","city": "South San Francisco","state": "CA","zipCode": 99236,"country": "United States of America"},"Phone": [{"type": "Office","number": "979-555-6598"}]},"Special Instructions": "Surface Mail","LineItems": [{"ItemNumber": 1,"Part": {"Description": "Run Lola Run","UnitPrice": 19.95,"UPCCode": 43396040144},"Quantity": 7},{"ItemNumber": 2,"Part": {"Description": "Felicia's Journey","UnitPrice": 19.95,"UPCCode": 12236101345},"Quantity": 1},{"ItemNumber": 3,"Part": {"Description": "Lost and Found","UnitPrice": 19.95,"UPCCode": 85391756323},"Quantity": 8},{"ItemNumber": 4,"Part": {"Description": "Karaoke: Rock & Roll Hits of 80's & 90's 8","UnitPrice": 19.95,"UPCCode": 13023009592},"Quantity": 8},{"ItemNumber": 5,"Part": {"Description": "Theremin: An Electronic Odyssey","UnitPrice": 19.95,"UPCCode": 27616864451},"Quantity": 8}]
}
19c和23ai的导入方法不同,分别说一下。
19c JSON标准示例数据的加载
创建directory:
CREATE OR REPLACE DIRECTORY order_entry_dirAS '/home/oracle/db-sample-schemas/order_entry';
创建外部表,然后将JSON文档导入到普通表中的一列:
CREATE TABLE json_dump_file_contents (json_document BLOB)ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY order_entry_dirACCESS PARAMETERS(RECORDS DELIMITED BY 0x'0A'DISABLE_DIRECTORY_LINK_CHECKFIELDS (json_document CHAR(5000)))LOCATION (order_entry_dir:'PurchaseOrders.dmp'))PARALLELREJECT LIMIT UNLIMITED;DROP TABLE j_purchaseorder;CREATE TABLE j_purchaseorder(id VARCHAR2 (32) NOT NULL PRIMARY KEY,date_loaded TIMESTAMP (6) WITH TIME ZONE,po_document BLOBCONSTRAINT ensure_json CHECK (po_document IS JSON))LOB (po_document) STORE AS (CACHE);INSERT INTO j_purchaseorder (id, date_loaded, po_document)SELECT SYS_GUID(), SYSTIMESTAMP, json_document FROM json_dump_file_contentsWHERE json_document IS JSON;
在以上SQL中,0x'0A'
为记录分隔符。从ASCII码表中可知,其实就是LF(linefeed),即换行。
$ ascii
Usage: ascii [-adxohv] [-t] [char-alias...]-t = one-line output -a = vertical format-d = decimal table -o = octal table -x = hex table -b binary table-h = this help screen -v = version information
Prints all aliases of an ASCII character. Args may be chars, C \-escapes,
English names, ^-escapes, ASCII mnemonics, or numerics in decimal/octal/hex.Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex0 00 NUL 16 10 DLE 32 20 48 30 0 64 40 @ 80 50 P 96 60 ` 112 70 p1 01 SOH 17 11 DC1 33 21 ! 49 31 1 65 41 A 81 51 Q 97 61 a 113 71 q2 02 STX 18 12 DC2 34 22 " 50 32 2 66 42 B 82 52 R 98 62 b 114 72 r3 03 ETX 19 13 DC3 35 23 # 51 33 3 67 43 C 83 53 S 99 63 c 115 73 s4 04 EOT 20 14 DC4 36 24 $ 52 34 4 68 44 D 84 54 T 100 64 d 116 74 t5 05 ENQ 21 15 NAK 37 25 % 53 35 5 69 45 E 85 55 U 101 65 e 117 75 u6 06 ACK 22 16 SYN 38 26 & 54 36 6 70 46 F 86 56 V 102 66 f 118 76 v7 07 BEL 23 17 ETB 39 27 ' 55 37 7 71 47 G 87 57 W 103 67 g 119 77 w8 08 BS 24 18 CAN 40 28 ( 56 38 8 72 48 H 88 58 X 104 68 h 120 78 x9 09 HT 25 19 EM 41 29 ) 57 39 9 73 49 I 89 59 Y 105 69 i 121 79 y10 0A LF 26 1A SUB 42 2A * 58 3A : 74 4A J 90 5A Z 106 6A j 122 7A z11 0B VT 27 1B ESC 43 2B + 59 3B ; 75 4B K 91 5B [ 107 6B k 123 7B {12 0C FF 28 1C FS 44 2C , 60 3C < 76 4C L 92 5C \ 108 6C l 124 7C |13 0D CR 29 1D GS 45 2D - 61 3D = 77 4D M 93 5D ] 109 6D m 125 7D }14 0E SO 30 1E RS 46 2E . 62 3E > 78 4E N 94 5E ^ 110 6E n 126 7E ~15 0F SI 31 1F US 47 2F / 63 3F ? 79 4F O 95 5F _ 111 6F o 127 7F DEL
单行JSON文档转换为多行
我们的运气较好,正好JSON文档是单行的。但如果JSON文件是多行的,由于19c并不能识别JSON的语义,因此必须将其转换成单行。
我们通过jq来将单行JSON文档转换为多行:
$ jq . PurchaseOrders.dmp > PurchaseOrders-multilines.dmp
$ wc -l PurchaseOrders-multilines.dmp
666647 PurchaseOrders-multilines.dmp
如果仍用之前方法导入,虽执行成功,但结果是无意义的:
SQL> select count(*) from json_dump_file_contents_multilines;COUNT(*)
----------666647
对于19c,JSON文件中的文档必须是单行。如果不是,可以用jq来转换。
可以用以下任意一种方法
jq . -c PurchaseOrders-multilines.dmp > PurchaseOrders-singleline.dmpjq -c < PurchaseOrders-multilines.dmp > PurchaseOrders-singleline.dmp
单行转换为多行如下:
jq . PurchaseOrders.dmp > PurchaseOrders-multilines.dmp
23ai JSON标准示例数据的加载
和19c不同,23ai可以识别JSON文档,因此外部文件中JSON文档是单行还是多行都可以支持。
CREATE TABLE json_file_contents (data JSON)ORGANIZATION EXTERNAL(TYPE ORACLE_BIGDATAACCESS PARAMETERS (com.oracle.bigdata.fileformat = jsondoc)LOCATION (order_entry_dir:'PurchaseOrders.json'))PARALLELREJECT LIMIT UNLIMITED;
以上命令中,采用了新的外部表驱动ORACLE_BIGDATA,而非像19c采用ORACLE_LOADER的RECORDS DELIMITED
。
接下来,将JSON文档导入某表中的一个字段:
DROP TABLE j_purchaseorder;CREATE TABLE j_purchaseorder(id number generated always as identity,date_loaded TIMESTAMP (6) WITH TIME ZONE,po_document JSON);INSERT INTO j_purchaseorder (date_loaded, po_document)SELECT SYSTIMESTAMP, dataFROM json_file_contents;10000 rows created.
除作为传统表中的JSON列,也支持导入到23ai新支持的JSON集合表。
CREATE JSON COLLECTION TABLE purchaseorders;INSERT INTO purchaseorders SELECT * FROM json_file_contents;
JSON集合表只有一个名为DATA的JSON列:
SQL> desc purchaseordersName Null? Type----------------------------------------- -------- ----------------------------DATA JSON
如何Pretty-Printed
查看第一条数据,格式不好,显示不全:
SQL> select po_document from j_purchaseorder where id =1;PO_DOCUMENT
--------------------------------------------------------------------------------
{"PONumber":1,"Reference":"MSULLIVA-20141102","Requestor":"Martha Sullivan","Use
要返回pretty的结果,可以使用以下方法之一:
set pages 9999
select JSON_QUERY(po_document, '$' RETURNING VARCHAR2(4000) PRETTY) from j_purchaseorder where id =1;
-- 或者
select json_serialize (po_document returning varchar2 pretty) from j_purchaseorder where id =1;JSON_QUERY(PO_DOCUMENT,'$'RETURNINGVARCHAR2(4000)PRETTY)
--------------------------------------------------------------------------------
{"PONumber" : 1,"Reference" : "MSULLIVA-20141102","Requestor" : "Martha Sullivan","User" : "MSULLIVA","CostCenter" : "A50","ShippingInstructions" :{"name" : "Martha Sullivan","Address" :{"street" : "200 Sporting Green","city" : "South San Francisco","state" : "CA","zipCode" : 99236,"country" : "United States of America"},"Phone" :[{"type" : "Office","number" : "979-555-6598"}]},"Special Instructions" : "Surface Mail","LineItems" :[{"ItemNumber" : 1,"Part" :{"Description" : "Run Lola Run","UnitPrice" : 19.95,"UPCCode" : 43396040144},"Quantity" : 7},{"ItemNumber" : 2,"Part" :{"Description" : "Felicia's Journey","UnitPrice" : 19.95,"UPCCode" : 12236101345},"Quantity" : 1},{"ItemNumber" : 3,"Part" :{"Description" : "Lost and Found","UnitPrice" : 19.95,"UPCCode" : 85391756323},"Quantity" : 8},{"ItemNumber" : 4,"Part" :{"Description" : "Karaoke: Rock & Roll Hits of 80's & 90's 8","UnitPrice" : 19.95,"UPCCode" : 13023009592},"Quantity" : 8},{"ItemNumber" : 5,"Part" :{"Description" : "Theremin: An Electronic Odyssey","UnitPrice" : 19.95,"UPCCode" : 27616864451},"Quantity" : 8}]
}
参考
- 19c: Loading External JSON Data
- 23ai: Loading External JSON Data