Java正则表达式教程及示例

什么是正则表达式？

正则表达式（Regular Expression）定义了字符串的模式。它可以用来搜索、编辑或处理文本。正则表达式并不仅限于某一种语言，但在不同语言中存在细微差别。Java 正则表达式与 Perl 的最为相似。

Java 正则表达式的类位于 java.util.regex 包中，主要包括三个类：Pattern、Matcher 和 PatternSyntaxException。

Pattern 对象：是正则表达式的已编译版本。它没有任何公共构造器，我们通过传递一个正则表达式参数给公共静态方法 compile 来创建一个 Pattern 对象。
Matcher 对象：是用来匹配输入字符串和创建的 Pattern 对象的正则引擎对象。这个类没有任何公共构造器，我们用 Pattern 对象的 matcher 方法，使用输入字符串作为参数来获得一个 Matcher 对象。然后使用 matches 方法，通过返回的布尔值判断输入字符串是否与正则匹配。
PatternSyntaxException 异常：如果正则表达式语法不正确，将抛出此异常。

让我们在一个简单的例子里看看这些类是怎么用的：

package com.journaldev.util;
 
import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class RegexExamples {
 
    public static void main(String[] args) {
        // using pattern with flags
        Pattern pattern = Pattern.compile("ab", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher("ABcabdAb");
        // using Matcher find(), group(), start() and end() methods
        while (matcher.find()) {
            System.out.println("Found the text \"" + matcher.group()
                    + "\" starting at " + matcher.start()
                    + " index and ending at index " + matcher.end());
        }
 
        // using Pattern split() method
        pattern = Pattern.compile("\\W");
        String[] words = pattern.split("one@two#three:four$five");
        for (String s : words) {
            System.out.println("Split using Pattern.split(): " + s);
        }
 
        // using Matcher.replaceFirst() and replaceAll() methods
        pattern = Pattern.compile("1*2");
        matcher = pattern.matcher("11234512678");
        System.out.println("Using replaceAll: " + matcher.replaceAll("_"));
        System.out.println("Using replaceFirst: " + matcher.replaceFirst("_"));
    }
 
}

上述程序的输出是：

Found the text "AB" starting at 0 index and ending at index 2
Found the text "ab" starting at 3 index and ending at index 5
Found the text "Ab" starting at 6 index and ending at index 8
Split using Pattern.split(): one
Split using Pattern.split(): two
Split using Pattern.split(): three
Split using Pattern.split(): four
Split using Pattern.split(): five
Using replaceAll: _345_678
Using replaceFirst: _34512678

字符串匹配方法

既然正则表达式总是和字符串有关，Java 1.4 对 String 类进行了扩展，提供了一个 matches 方法来匹配 pattern。在方法内部使用 Pattern 和 Matcher 类来处理这些东西，但显然这样减少了代码的行数。

Pattern 类同样有 matches 方法，可以让正则和作为参数输入的字符串匹配，输出布尔值结果。

下述的代码可以将输入字符串和正则表达式进行匹配：

String str = "bbb";
System.out.println("Using String matches method: " + str.matches(".bb"));
System.out.println("Using Pattern matches method: " + Pattern.matches(".bb", str));

所以如果你的需要仅仅是检查输入字符串是否和 pattern 匹配，你可以通过调用 String 的 matches 方法省下时间。只有当你需要操作输入字符串或者重用 pattern 的时候，你才需要使用 Pattern 和 Matcher 类。

匹配逻辑说明

注意由正则定义的 pattern 是从左至右应用的，一旦一个原字符在一次匹配中使用过了，将不会再次使用。

例如，正则 121 只会匹配两次字符串 31212142121，就像这样 _121____121。

正则表达式通用匹配符号

正则表达式	说明	示例
`.`	匹配任何单个符号，包括所有字符	`("..", "a%")` – true `("..", ".a")` – true `("..", "a")` – false
`^xxx`	在开头匹配正则 xxx	`("^a.c.", "abcd")` – true `("^a", "a")` – true `("^a", "ac")` – false
`xxx$`	在结尾匹配正则 xxx	`("..cd$", "abcd")` – true `("a$", "a")` – true `("a$", "aca")` – false
`[abc]`	能够匹配字母 a,b 或 c。[] 被称为字符类 (character classes)	`("^[abc]d.", "ad9")` – true `("[ab].d$", "bad")` – true `("[ab]x", "cx")` – false
`[abc][12]`	能够匹配由 1 或 2 跟着的 a,b 或 c	`("[ab][12].", "a2#")` – true `("[ab]..[12]", "acd2")` – true `("[ab][12]", "c2")` – false
`[^abc]`	当 ^ 是 [] 中的第一个字符时代表取反，匹配除了 a,b 或 c 之外的任意字符	`("[^ab][^12].", "c3#")` – true `("[^ab]..[^12]", "xcd3")` – true `("[^ab][^12]", "c2")` – false
`[a-e1-8]`	匹配 a 到 e 或者 1 到 8 之间的字符	`("[a-e1-3].", "d#")` – true `("[a-e1-3]", "2")` – true `("[a-e1-3]", "f2")` – false
`xx\	yy`	匹配正则 xx 或者 yy	`("x.	y", "xa") `– true<br>`("x.	y", "y") `– true<br>`("x.	y", "yz")` – false

Java 正则表达式元字符

正则表达式	说明
`\d`	任意数字，等同于 `[0-9]`
`\D`	任意非数字，等同于 `[^0-9]`
`\s`	任意空白字符，等同于 `[\t\n\x0B\f\r]`
`\S`	任意非空白字符，等同于 `[^\s]`
`\w`	任意英文字符，等同于 `[a-zA-Z_0-9]`
`\W`	任意非英文字符，等同于 `[^\w]`
`\b`	单词边界
`\B`	非单词边界

有两种方法可以在正则表达式中像一般字符一样使用元字符：

在元字符前添加反斜杠 (\\)
将元字符置于 \Q (开始引用) 和 \E (结束引用) 间

正则表达式量词

量词指定了字符匹配的发生次数。

正则表达式	说明
`x?`	x 没有出现或者只出现一次
`X*`	X 出现 0 次或更多
`X+`	X 出现 1 次或更多
`X{n}`	X 正好出现 n 次
`X{n,}`	X 出现 n 次或更多
`X{n,m}`	X 出现至少 n 次但不多于 m 次

量词可以和字符类 (character classes) 和捕获组 (capturing group) 一起使用。

例如，[abc]+ 表示 a,b 或 c 出现一次或者多次。
(abc)+ 表示捕获组 "abc" 出现一次或多次。我们即将讨论捕获组。

正则表达式捕获组 (Capturing Group)

Capturing group 是用来对付作为一个整体出现的多个字符。你可以通过使用 () 来建立一个 group。输入字符串中和 capturing group 相匹配的部分将保存在内存里，并且可以通过使用反向引用 (Backreference) 调用。

你可以使用 matcher.groupCount 方法来获得一个正则 pattern 中 capturing groups 的数目。例如 ((a)(bc)) 包含 3 个 capturing groups: ((a)(bc)), (a) 和 (bc)。

你可以使用在正则表达式中使用 Backreference，一个反斜杠 (\\) 接要调用的 group 号码。

Capturing groups 和 Backreferences 可能很令人困惑，所以我们通过一个例子来理解：

System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false

在第一个例子里，运行的时候第一个 capturing group 是 (\\w\\d)，在和输入字符串 a2a2 匹配的时候获取 a2 并保存到内存里。因此 \\1 是 a2 的引用，并且返回 true。基于相同的原因，第二行代码打印 false。

试着自己理解第三行和第四行代码。

常用方法总结

现在我们来看看 Pattern 和 Matcher 类中一些重要的方法。

我们可以创建一个带有标志的 Pattern 对象。例如 Pattern.CASE_INSENSITIVE 可以进行大小写不敏感的匹配。
Pattern 类同样提供了和 String 类相似的 split(String) 方法。
Pattern 类 toString() 方法返回被编译成这个 pattern 的正则表达式字符串。
Matcher 类有 start() 和 end() 索引方法，他们可以显示从输入字符串中匹配到的准确位置。
Matcher 类同样提供了字符串操作方法 replaceAll(String replacement) 和 replaceFirst(String replacement)。

原文链接： journaldev

说明：本文基于 Java 1.4 引入的正则表达式特性编写，核心 API（java.util.regex）在后续 Java 版本中保持兼容，适用于 Java 1.4 及更高版本。

本文地址：https://1diff.fun/archives/java-zheng-ze-biao-da-shi-jiao-cheng-ji-shi-li.html

如果对本文有什么问题或疑问都可以在评论区留言，我看到后会尽量解答。