|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectcom.ibm.icu.text.SpoofChecker
public class SpoofChecker
\brief for Unicode Security and Spoofing Detection. These functions are intended to check strings, typically identifiers of some type, such as URLs, for the presence of characters that are likely to be visually confusing - for cases where the displayed form of an identifier may not be what it appears to be. Unicode Technical Report #36, http://unicode.org/reports/tr36, and Unicode Technical Standard #39, http://unicode.org/reports/tr39 "Unicode security considerations", give more background on security and spoofing issues with Unicode identifiers. The tests and checks provided by this module implement the recommendations from these Unicode documents. The tests available on identifiers fall into two general categories: -# Single identifier tests. Check whether an identifier is potentially confusable with any other string, or is suspicious for other reasons. -# Two identifier tests. Check whether two specific identifiers are confusable. This does not consider whether either of strings is potentially confusable with any string other than the exact one specified. The steps to perform confusability testing are -# Create a SpoofChecker.Builder -# Configure the Builder for the desired set of tests. The tests that will be performed are specified by a set of SpoofCheck flags. -# Build a SpoofChecker from the Builder. -# Perform the checks using the pre-configured SpoofChecker. The results indicate which (if any) of the selected tests have identified possible problems with the identifier. Results are reported as a set of SpoofCheck flags; this mirrors the form in which the set of tests to perform was originally specified to the SpoofChecker. A SpoofChecker may be used repeatedly to perform checks on any number of identifiers. Thread Safety: The methods on SpoofChecker objects are thread safe. The test functions for checking a single identifier, or for testing whether two identifiers are potentially confusable, may called concurrently from multiple threads using the same SpoofChecker instance. Descriptions of the available checks. When testing whether pairs of identifiers are confusable, with the areConfusable() family of functions, the relevant tests are -# SINGLE_SCRIPT_CONFUSABLE: All of the characters from the two identifiers are from a single script, and the two identifiers are visually confusable. -# MIXED_SCRIPT_CONFUSABLE: At least one of the identifiers contains characters from more than one script, and the two identifiers are visually confusable. -# WHOLE_SCRIPT_CONFUSABLE: Each of the two identifiers is of a single script, but the the two identifiers are from different scripts, and they are visually confusable. The safest approach is to enable all three of these checks as a group. ANY_CASE is a modifier for the above tests. If the identifiers being checked can be of mixed case and are used in a case-sensitive manner, this option should be specified. If the identifiers being checked are used in a case-insensitive manner, and if they are displayed to users in lower-case form only, the ANY_CASE option should not be specified. Confusabality issues involving upper case letters will not be reported. When performing tests on a single identifier, with the check() family of functions, the relevant tests are: -# MIXED_SCRIPT_CONFUSABLE: the identifier contains characters from multiple scripts, and there exists an identifier of a single script that is visually confusable. -# WHOLE_SCRIPT_CONFUSABLE: the identifier consists of characters from a single script, and there exists a visually confusable identifier. The visually confusable identifier also consists of characters from a single script. but not the same script as the identifier being checked. -# ANY_CASE: modifies the mixed script and whole script confusables tests. If specified, the checks will find confusable characters of any case. If this flag is not set, the test is performed assuming case folded identifiers. -# SINGLE_SCRIPT: check that the identifier contains only characters from a single script. (Characters from the 'common' and 'inherited' scripts are ignored.) This is not a test for confusable identifiers -# INVISIBLE: check an identifier for the presence of invisible characters, such as zero-width spaces, or character sequences that are likely not to display, such as multiple occurrences of the same non-spacing mark. This check does not test the input string as a whole for conformance to any particular syntax for identifiers. -# CHAR_LIMIT: check that an identifier contains only characters from a specified set of acceptable characters. See Builder.setAllowedChars() and Builder.setAllowedLocales(). Note on Scripts: Characters from the Unicode Scripts "Common" and "Inherited" are ignored when considering the script of an identifier. Common characters include digits and symbols that are normally used with text from many different scripts.
| Nested Class Summary | |
|---|---|
static class |
SpoofChecker.Builder
SpoofChecker Builder. |
static class |
SpoofChecker.CheckResult
Represent the results of a Spoof Check operation. |
| Field Summary | |
|---|---|
static int |
ALL_CHECKS
Enable all spoof checks. |
static int |
ANY_CASE
Any Case Modifier for confusable identifier tests. |
static int |
CHAR_LIMIT
Check that an identifier contains only characters from a specified set of acceptable characters. |
static int |
INVISIBLE
Check an identifier for the presence of invisible characters, such as zero-width spaces, or character sequences that are likely not to display, such as multiple occurrences of the same non-spacing mark. |
static int |
MIXED_SCRIPT_CONFUSABLE
Mixed script confusable test. |
static int |
SINGLE_SCRIPT
Check that an identifer contains only characters from a single script (plus chars from the common and inherited scripts.) |
static int |
SINGLE_SCRIPT_CONFUSABLE
Single script confusable test. |
static int |
WHOLE_SCRIPT_CONFUSABLE
Whole script confusable test. |
| Method Summary | |
|---|---|
int |
areConfusable(String s1,
String s2)
Check the whether two specified strings are visually confusable. |
boolean |
check(String text)
Check the specified string for possible security issues. |
boolean |
check(String text,
SpoofChecker.CheckResult checkResult)
Check the specified string for possible security issues. |
UnicodeSet |
getAllowedChars()
Get a UnicodeSet for the characters permitted in an identifier. |
Set<ULocale> |
getAllowedLocales()
Get a list of locales for the scripts that are acceptable in strings to be checked. |
int |
getChecks()
Get the set of checks that this Spoof Checker has been configured to perform. |
String |
getSkeleton(int type,
String s)
Get the "skeleton" for an identifier string. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final int SINGLE_SCRIPT_CONFUSABLE
public static final int MIXED_SCRIPT_CONFUSABLE
public static final int WHOLE_SCRIPT_CONFUSABLE
public static final int ANY_CASE
public static final int SINGLE_SCRIPT
public static final int INVISIBLE
public static final int CHAR_LIMIT
public static final int ALL_CHECKS
| Method Detail |
|---|
public int getChecks()
public Set<ULocale> getAllowedLocales()
public UnicodeSet getAllowedChars()
public boolean check(String text,
SpoofChecker.CheckResult checkResult)
text - A String to be checked for possible security issues.checkResult - Optional caller provided fill-in parameter. If not null, on return it will be filled.
public boolean check(String text)
text - A String to be checked for possible security issues.
public int areConfusable(String s1,
String s2)
s1 - The first of the two strings to be compared for confusability.s2 - The second of the two strings to be compared for confusability.
public String getSkeleton(int type,
String s)
type - The type of skeleton, corresponding to which of the Unicode confusable data tables to use. The default
is Mixed-Script, Lowercase. Allowed options are SINGLE_SCRIPT_CONFUSABLE and ANY_CASE_CONFUSABLE. The
two flags may be ORed.s - The input string whose skeleton will be genereated.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||