亞太技術年會 ( DevDays Asia 2024 ) 微軟專家針對 T-SQL 未來將支援 Regular Expressions 的測試與練習
早在今年五月的時候,就有看到一些影片介紹未來 T-SQL 的語法將要支援 Regular Expressions ( 正規表示式 ) ,當下因為只能看影片也就大概知道有那樣的一回事,也就看看就算了。而在八月份在台北的亞太技術年會中,也看到微軟的專家有現場的展示一下對於 Regular Expressions 的展示,但是整場下來幾乎就是看投影片,也沒有辦法真正去知道到底可以做甚麼用。
前一陣子因為剛好工作上有需要做到一些類似的處理,原本想要自己撰寫 CLR UDF 來做支援,但是從微軟介紹的資料中看到,目前可以開放申請測試,也就去做了申請。在經過一兩個星期的等待之後,總算有申請下來可以開始進行測試,但……. 問題就來了。雖然說微軟官方讓我可以測試,但在沒有任何文件的狀況下,就算有開放讓我測試,我也不知道要怎麼來做使用啊 ?! 所以這件事情又被我擱置下來了 ~~
這幾天剛好在測試如何利用 OpenAI 跟產品整合的過程中,忽然想到可以讓 ChatGPT 來幫我整理一下各家資料庫對於 Regular Expressions 的支援程度,果然花錢買帳號是有用的,馬上得到一份整理的表格,看到這個我整個眼睛都亮起來了
從上述的資料中我們可以看到,Oracle 和 MySQL 都有支援 PCRE ( Perl Compatible Regular Expressions ),也就是指「與 Perl 相容的正則表達式」。那既然是這樣的情況,是否也就意味著我可以先查看 Oracle 的文件,然後到 MySQL 上面來做測試,畢竟 Oracle 已經從 Oracle 10g 就開始支援了,那勢必這樣的文件就很容易找得到;但因為 Oracle 的環境架設太麻煩,但我可以直接在 Azure 上面啟用一個 MySQL 的資料庫服務,就可以很便宜的直接來進行測試了。
參考資料
- Oracle Tutorial 網站中的 Oracle Character Functions 的介紹
- W3Resource 網站中的 Oracle Character Functions 的介紹
測試範例
資料表建立
CREATE TABLE employees
(
employee_id INT PRIMARY KEY,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
phone VARCHAR( 50) NOT NULL,
hire_date DATE NOT NULL,
manager_id INT,
job_title VARCHAR(255) NOT NULL,
);
範例資料
INSERT INTO employees (employee_id,first_name,last_name,email,phone,hire_date,manager_id,job_title)
VALUES
(107,'Summer','Payne','summer.payne@example.com','515.123.8181','2016/06/07',106,'Public Accountant'),
(106,'Rose','Stephens','rose.stephens@example.com','515.123.8080','2016/06/07',2,'Accounting Manager'),
(101,'Annabelle','Dunn','annabelle.dunn@example.com','515.123.4444','2016/09/17',2,'Administration Assistant'),
(1,'Tommy','Bailey','tommy.bailey@example.com','515.123.4567','2016/06/17',NULL,'President'),
(3,'Blake','Cooper','blake.cooper@example.com','515.123.4569','2016/01/13',1,'Administration Vice President'),
(2,'Jude','Rivera','jude.rivera@example.com','515.123.4568','2016/09/21',1,'Administration Vice President'),
(11,'Tyler','Ramirez','tyler.ramirez@example.com','515.124.4269','2016/09/28',9,'Accountant'),
(10,'Ryan','Gray','ryan.gray@example.com','515.124.4169','2016/08/16',9,'Accountant'),
(14,'Elliot','Brooks','elliot.brooks@example.com','515.124.4567','2016/12/07',9,'Accountant'),
(12,'Elliott','James','elliott.james@example.com','515.124.4369','2016/09/30',9,'Accountant'),
(13,'Albert','Watson','albert.watson@example.com','515.124.4469','2016/03/07',9,'Accountant'),
(9,'Mohammad','Peterson','mohammad.peterson@example.com','515.124.4569','2016/08/17',2,'Finance Manager'),
(104,'Harper','Spencer','harper.spencer@example.com','515.123.7777','2016/06/07',2,'Human Resources Representative'),
(4,'Louie','Richardson','louie.richardson@example.com','590.423.4567','2016/01/03',3,'Programmer'),
(5,'Nathan','Cox','nathan.cox@example.com','590.423.4568','2016/05/21',4,'Programmer'),
(8,'Bobby','Torres','bobby.torres@example.com','590.423.5567','2016/02/07',4,'Programmer'),
(7,'Charles','Ward','charles.ward@example.com','590.423.4560','2016/02/05',4,'Programmer'),
(6,'Gabriel','Howard','gabriel.howard@example.com','590.423.4569','2016/06/25',4,'Programmer'),
(102,'Emma','Perkins','emma.perkins@example.com','515.123.5555','2016/02/17',1,'Marketing Manager'),
(103,'Amelie','Hudson','amelie.hudson@example.com','603.123.6666','2016/08/17',102,'Marketing Representative'),
(105,'Gracie','Gardner','gracie.gardner@example.com','515.123.8888','2016/06/07',2,'Public Relations Representative'),
(17,'Frederick','Price','frederick.price@example.com','515.127.4563','2016/12/24',15,'Purchasing Clerk'),
(16,'Alex','Sanders','alex.sanders@example.com','515.127.4562','2016/05/18',15,'Purchasing Clerk'),
(18,'Ollie','Bennett','ollie.bennett@example.com','515.127.4564','2016/07/24',15,'Purchasing Clerk'),
(19,'Louis','Wood','louis.wood@example.com','515.127.4565','2016/11/15',15,'Purchasing Clerk'),
(20,'Dexter','Barnes','dexter.barnes@example.com','515.127.4566','2016/08/10',15,'Purchasing Clerk'),
(15,'Rory','Kelly','rory.kelly@example.com','515.127.4561','2016/12/07',1,'Purchasing Manager'),
(49,'Isabella','Cole','isabella.cole@example.com','011.44.1344.619268','2016/10/15',1,'Sales Manager'),
(48,'Jessica','Woods','jessica.woods@example.com','011.44.1344.429278','2016/03/10',1,'Sales Manager'),
(47,'Ella','Wallace','ella.wallace@example.com','011.44.1344.467268','2016/01/05',1,'Sales Manager'),
(46,'Ava','Sullivan','ava.sullivan@example.com','011.44.1344.429268','2016/10/01',1,'Sales Manager'),
(50,'Mia','West','mia.west@example.com','011.44.1344.429018','2016/01/29',1,'Sales Manager'),
(56,'Evie','Harrison','evie.harrison@example.com','011.44.1344.486508','2016/11/23',46,'Sales Representative'),
(57,'Scarlett','Gibson','scarlett.gibson@example.com','011.44.1345.429268','2016/01/30',47,'Sales Representative'),
(58,'Ruby','Mcdonald','ruby.mcdonald@example.com','011.44.1345.929268','2016/03/04',47,'Sales Representative'),
(59,'Chloe','Cruz','chloe.cruz@example.com','011.44.1345.829268','2016/08/01',47,'Sales Representative'),
(60,'Isabelle','Marshall','isabelle.marshall@example.com','011.44.1345.729268','2016/03/10',47,'Sales Representative'),
(61,'Daisy','Ortiz','daisy.ortiz@example.com','011.44.1345.629268','2016/12/15',47,'Sales Representative'),
(62,'Freya','Gomez','freya.gomez@example.com','011.44.1345.529268','2016/11/03',47,'Sales Representative'),
(80,'Elizabeth','Dixon','elizabeth.dixon@example.com','011.44.1644.429262','2016/01/04',50,'Sales Representative'),
(64,'Florence','Freeman','florence.freeman@example.com','011.44.1346.229268','2016/03/19',48,'Sales Representative'),
(65,'Alice','Wells','alice.wells@example.com','011.44.1346.329268','2016/01/24',48,'Sales Representative'),
(66,'Charlotte','Webb','charlotte.webb@example.com','011.44.1346.529268','2016/02/23',48,'Sales Representative'),
(67,'Sienna','Simpson','sienna.simpson@example.com','011.44.1346.629268','2016/03/24',48,'Sales Representative'),
(68,'Matilda','Stevens','matilda.stevens@example.com','011.44.1346.729268','2016/04/21',48,'Sales Representative'),
(69,'Evelyn','Tucker','evelyn.tucker@example.com','011.44.1343.929268','2016/03/11',49,'Sales Representative'),
(70,'Eva','Porter','eva.porter@example.com','011.44.1343.829268','2016/03/23',49,'Sales Representative'),
(71,'Millie','Hunter','millie.hunter@example.com','011.44.1343.729268','2016/01/24',49,'Sales Representative'),
(72,'Sofia','Hicks','sofia.hicks@example.com','011.44.1343.629268','2016/02/23',49,'Sales Representative'),
(73,'Lucy','Crawford','lucy.crawford@example.com','011.44.1343.529268','2016/03/24',49,'Sales Representative'),
(74,'Elsie','Henry','elsie.henry@example.com','011.44.1343.329268','2016/04/21',49,'Sales Representative'),
(75,'Imogen','Boyd','imogen.boyd@example.com','011.44.1644.429267','2016/05/11',50,'Sales Representative'),
(76,'Layla','Mason','layla.mason@example.com','011.44.1644.429266','2016/03/19',50,'Sales Representative'),
(77,'Rosie','Morales','rosie.morales@example.com','011.44.1644.429265','2016/03/24',50,'Sales Representative'),
(78,'Maya','Kennedy','maya.kennedy@example.com','011.44.1644.429264','2016/04/23',50,'Sales Representative'),
(79,'Esme','Warren','esme.warren@example.com','011.44.1644.429263','2016/05/24',50,'Sales Representative'),
(55,'Grace','Ellis','grace.ellis@example.com','011.44.1344.987668','2016/12/09',46,'Sales Representative'),
(54,'Lily','Fisher','lily.fisher@example.com','011.44.1344.498718','2016/03/30',46,'Sales Representative'),
(53,'Sophia','Reynolds','sophia.reynolds@example.com','011.44.1344.478968','2016/08/20',46,'Sales Representative'),
(52,'Sophie','Owens','sophie.owens@example.com','011.44.1344.345268','2016/03/24',46,'Sales Representative'),
(51,'Poppy','Jordan','poppy.jordan@example.com','011.44.1344.129268','2016/01/30',46,'Sales Representative'),
(63,'Phoebe','Murray','phoebe.murray@example.com','011.44.1346.129268','2016/11/11',48,'Sales Representative'),
(85,'Holly','Shaw','holly.shaw@example.com','650.509.1876','2016/01/27',22,'Shipping Clerk'),
(86,'Emilia','Holmes','emilia.holmes@example.com','650.509.2876','2016/02/20',22,'Shipping Clerk'),
(87,'Molly','Rice','molly.rice@example.com','650.509.3876','2016/06/24',22,'Shipping Clerk'),
(88,'Ellie','Robertson','ellie.robertson@example.com','650.509.4876','2016/02/07',22,'Shipping Clerk'),
(89,'Jasmine','Hunt','jasmine.hunt@example.com','650.505.1876','2016/06/14',23,'Shipping Clerk'),
(90,'Eliza','Black','eliza.black@example.com','650.505.2876','2016/08/13',23,'Shipping Clerk'),
(91,'Lilly','Daniels','lilly.daniels@example.com','650.505.3876','2016/07/11',23,'Shipping Clerk'),
(92,'Abigail','Palmer','abigail.palmer@example.com','650.505.4876','2016/12/19',23,'Shipping Clerk'),
(93,'Georgia','Mills','georgia.mills@example.com','650.501.1876','2016/02/04',24,'Shipping Clerk'),
(94,'Maisie','Nichols','maisie.nichols@example.com','650.501.2876','2016/03/03',24,'Shipping Clerk'),
(95,'Eleanor','Grant','eleanor.grant@example.com','650.501.3876','2016/07/01',24,'Shipping Clerk'),
(96,'Hannah','Knight','hannah.knight@example.com','650.501.4876','2016/03/17',24,'Shipping Clerk'),
(97,'Harriet','Ferguson','harriet.ferguson@example.com','650.507.9811','2016/04/24',25,'Shipping Clerk'),
(98,'Amber','Rose','amber.rose@example.com','650.507.9822','2016/05/23',25,'Shipping Clerk'),
(99,'Bella','Stone','bella.stone@example.com','650.507.9833','2016/06/21',25,'Shipping Clerk'),
(100,'Thea','Hawkins','thea.hawkins@example.com','650.507.9844','2016/01/13',25,'Shipping Clerk'),
(81,'Lola','Ramos','lola.ramos@example.com','650.507.9876','2016/01/24',21,'Shipping Clerk'),
(82,'Willow','Reyes','willow.reyes@example.com','650.507.9877','2016/02/23',21,'Shipping Clerk'),
(83,'Ivy','Burns','ivy.burns@example.com','650.507.9878','2016/06/21',21,'Shipping Clerk'),
(84,'Erin','Gordon','erin.gordon@example.com','650.507.9879','2016/02/03',21,'Shipping Clerk'),
(33,'Reggie','Simmons','reggie.simmons@example.com','650.124.8234','2016/04/10',22,'Stock Clerk'),
(44,'Emily','Hamilton','emily.hamilton@example.com','650.121.2874','2016/03/15',25,'Stock Clerk'),
(43,'Olivia','Ford','olivia.ford@example.com','650.121.2994','2016/01/29',25,'Stock Clerk'),
(42,'Amelia','Myers','amelia.myers@example.com','650.121.8009','2016/10/17',25,'Stock Clerk'),
(41,'Connor','Hayes','connor.hayes@example.com','650.121.1834','2016/04/06',24,'Stock Clerk'),
(26,'Leon','Powell','leon.powell@example.com','650.124.1214','2016/07/16',21,'Stock Clerk'),
(27,'Kai','Long','kai.long@example.com','650.124.1224','2016/09/28',21,'Stock Clerk'),
(28,'Aaron','Patterson','aaron.patterson@example.com','650.124.1334','2016/01/14',21,'Stock Clerk'),
(29,'Roman','Hughes','roman.hughes@example.com','650.124.1434','2016/03/08',21,'Stock Clerk'),
(30,'Austin','Flores','austin.flores@example.com','650.124.5234','2016/08/20',22,'Stock Clerk'),
(31,'Ellis','Washington','ellis.washington@example.com','650.124.6234','2016/10/30',22,'Stock Clerk'),
(32,'Jamie','Butler','jamie.butler@example.com','650.124.7234','2016/02/16',22,'Stock Clerk'),
(45,'Isla','Graham','isla.graham@example.com','650.121.2004','2016/07/09',25,'Stock Clerk'),
(34,'Seth','Foster','seth.foster@example.com','650.127.1934','2016/06/14',23,'Stock Clerk'),
(35,'Carter','Gonzales','carter.gonzales@example.com','650.127.1834','2016/08/26',23,'Stock Clerk'),
(36,'Felix','Bryant','felix.bryant@example.com','650.127.1734','2016/12/12',23,'Stock Clerk'),
(37,'Ibrahim','Alexander','ibrahim.alexander@example.com','650.127.1634','2016/02/06',23,'Stock Clerk'),
(38,'Sonny','Russell','sonny.russell@example.com','650.121.1234','2016/07/14',24,'Stock Clerk'),
(39,'Kian','Griffin','kian.griffin@example.com','650.121.2034','2016/10/26',24,'Stock Clerk'),
(40,'Caleb','Diaz','caleb.diaz@example.com','650.121.2019','2016/02/12',24,'Stock Clerk'),
(25,'Ronnie','Perry','ronnie.perry@example.com','650.123.5234','2016/11/16',1,'Stock Manager'),
(24,'Callum','Jenkins','callum.jenkins@example.com','650.123.4234','2016/10/10',1,'Stock Manager'),
(23,'Jackson','Coleman','jackson.coleman@example.com','650.123.3234','2016/05/01',1,'Stock Manager'),
(22,'Liam','Henderson','liam.henderson@example.com','650.123.2234','2016/04/10',1,'Stock Manager'),
(21,'Jaxon','Ross','jaxon.ross@example.com','650.123.1234','2016/07/18',1,'Stock Manager');
語法測試
測試函數 : REGEXP_LIKE
語法結構
REGEXP_LIKE(source_string, search_pattern [, match_parameter]);
測試案例
-- 找出資料內有 c 的紀錄
SELECT first_name
FROM
employees
WHERE
REGEXP_LIKE( first_name, 'c' )
ORDER BY first_name;
-- 找出字首有包含 a 的紀錄 , 不區分大小寫
SELECT last_name
FROM
employees
WHERE
REGEXP_LIKE( last_name, '^a', 'i' );
-- 字尾是 Y or y 的紀錄
SELECT first_name
FROM
employees
WHERE
REGEXP_LIKE( first_name, 'y$', 'i' )
ORDER BY
first_name;
--資料包含 M、N、m、n 在字首的紀錄
SELECT first_name
FROM
employees
WHERE
REGEXP_LIKE(first_name,'^m|^n','i')
ORDER BY first_name;
-- 資料中有同時出現 l 兩次的紀錄
SELECT first_name
FROM
employees
WHERE
REGEXP_LIKE( first_name, 'l{2}' )
ORDER BY
first_name;
測試函數 : REGEXP_REPLACE
語法結構
REGEXP_REPLACE ( source_string, search_pattern
[, replacement_string
[, star_position
[, nth_occurrence
[, match_parameter ]
]
]
]
)
測試案例
-- 將資料中非英文字母、空白和底線的全部取代成為空字串
-- Result : This is a demo of REGEXP_REPLACE function
SELECT
REGEXP_REPLACE('Th♥is∞ is a dem☻o of REGEXP_♫REPLACE function','[^a-z_A-Z ]')
-- 將信用卡號碼分成三段 , 前面三個數字 , 中間和尾部的四個數字, 然後取代成為 前面三個數字 , 中間換成十個 * 再加上尾部四個數字
-- Result : 402**********8590
SELECT
regexp_replace( '4024007187788590',
'(^\d{3})(.*)(\d{4}$)', '\1**********\3' )
credit_card
-- 將資料中間兩個以上的空白取代成為只有一個空白
-- Result : This line contains more than one spacing between words
SELECT
regexp_replace(
'This line contains more than one spacing between words'
, '( ){2,}', ' ' ) regexp_replace
-- 將一個以點號分隔的電話號碼重新格式化為常見的 (xxx) xxx-xxxx 電話格式
-- Result : 515.123.4444 → (515) 123-4444
SELECT
first_name,
last_name,
REGEXP_REPLACE( phone, '(\d{3})\.(\d{3})\.(\d{4})', '(\1) \2-\3' ) phone_number
FROM
employees
ORDER BY
phone_number;
測試函數 : REGEXP_SUBSTR
語法結構
REGEXP_SUBSTR(source_string, pattern
[, start_position
[, occurrence
[, match_parameter
[, subexpr
]
]
]
]
)
測試案例
-- 取出資料內文字第四段出來 ( 會將 _ 也視為是分隔符號 )
-- Result : regexp
SELECT
regexp_substr( 'This is a regexp_substr demo', '[[:alpha:]]+', 1, 4) the_4th_word ;
-- 取出資料內文字第四段出來 ( 不會將 _ 視為是分隔符號 )
-- Result : regexp_substr
SELECT
regexp_substr( 'This is a regexp_substr demo', '[[:alpha:]_]+', 1, 4) the_4th_word
-- 取出資料內有包含開頭是 d , 第三碼是 g 的字段
-- Result : dog
SELECT
REGEXP_SUBSTR('Cats and dogs', 'd.g') ;
-- 取出用逗號分隔的第二個資料
-- Result : banana
SELECT
REGEXP_SUBSTR('apple,banana,orange', '[^,]+', 1, 2) ;
-- 取出 @ 後面的資料
-- Result : tommy.bailey@example.com → example.com
SELECT
email, REGEXP_SUBSTR(email, '[^@]+',1,2 ) domain
FROM
employees;
測試函數 : REGEXP_INSTR
語法結構
REGEXP_INSTR ( source_string, pattern
[, position
[, occurrence
[, return_opt
[, match_param
[, subexpr ]
]
]
]
]
)
測試案例
-- 取出第一個電話號碼的位置
-- Result : 38
SELECT
REGEXP_INSTR(
'If you have any question please call 123-456-7890 or (123)-456-7891',
'(\+?( |-|\.)?\d{1,2}( |-|\.)?)?(\(?\d{3}\)?|\d{3})( |-|\.)?(\d{3}( |-|\.)?\d{4})') First_Phone_No
-- 第一個 t 字母所出現的位置
-- Result : 19
SELECT
REGEXP_INSTR('The web development Tutorial', 't')
-- 取找 Q 開頭和後面有接母音的位置
-- Result : 10
SELECT
REGEXP_INSTR('Database Query', '[Q][aeiou]', 1, 1, 0)
-- 查詢以 s, r, 或 p 開頭,後面緊跟六個字母的單字所在的位置 , 比對時候不區分大小寫:
-- Result : 12
SELECT
REGEXP_INSTR('500 Oracle Parkway, Redwood Shores, CA', '[s|r|p][[:alpha:]]{6}', 1, 1, 0, 'i') ;
-- Result : 37
SELECT
REGEXP_INSTR('500 Oracle Parkway, Redwood Shores, CA', '[^ ]+', 1, 6) ;
測試函數 : REGEXP_COUNT
語法結構
REGEXP_COUNT (source_string, pattern [, position [, match_param]])
測試案例
-- 取出有數值的個數
-- Result : 2
SELECT
REGEXP_COUNT('An apple costs 50 cents, a banana costs 10 cents.','\d+') result
-- 取出單字中有 d 開頭 g 結尾的字數
-- Result : 1
SELECT
REGEXP_COUNT('My dog drinks beer', 'd.g')
在測試完上述相關函數之後 , 真的覺得如果未來在 T-SQL 內就可以支援的話 , 相信可以簡化很多原本在 SQL 內的判斷邏輯 , 但是因為使用這類的處理的時候 , 勢必也沒有辦法配合到索引 , 加上相關正則表示還是有點小複雜 , 可能也會造成學習上的一個小瓶頸 , 看來後續使用上也是要稍微地去注意。