|
awk has a number of functions that perform string operations:
- length
- Returns an integer that is the length of the current record
(that is, the number of characters in the record, without the newline
on the end). For example, the following program calculates the total
number of characters in a file (except for newline characters):
{ sum = sum + length }
END { print sum }
- length(s)
- Returns an integer that is the length of the string s.
For example, the following program prints the length of the first
field in each record of the file:
{ print length($1) }
The function call length($0) is equivalent
to just length.
- gsub(regexp,replacement)
- Puts the replacement string replacement in place of
every string matching the regular expression regexp in the
current record. For example, the program:
{
gsub(/John/,"Jonathan")
print
}
checks every record in the data file for the regular expression John, replaces matching strings with Jonathan, and
prints the resulting record. As a result, the program's output is
exactly like its input, except that every occurrence of John is
changed to Jonathan. This form of the gsub function
returns an integer telling how many substitutions were made in the
current record. This is 0 if the record has no strings that match regexp.
- sub(regexp,replacement)
- Is similar to gsub, except that it replaces only the first occurrence of a string matching regexp in
the current record.
- gsub(regexp,replacement,string_var)
- Puts the replacement string replacement in place of
every string matching the regular expression regexp in the
string string_var. For example, the program:
{
gsub(/John/,"Jonathan",$1)
print
}
is similar to the previous program, but the replacement is
made only in the first field of each record. This form of the gsub function
returns an integer telling how many substitutions were made in string_var.
- sub(regexp,replacement,string_var)
- Is similar to the previous version of gsub, except
that it only replaces the first occurrence
of a string matching regexp in the string string_var.
Note: You must use four
backslashes to embed one literal backslash in a gsub() or sub() substitution string. For example,
gsub(/backslash/,"\\\\")
replaces
all occurrences of the word backslash with the single character
\.
- index(string,substring)
- Searches the given string for the appearance of the
given substring. If it cannot find substring, index returns 0; otherwise, index returns the number (origin
1) of the character in string where substring begins.
For example:
index("abcd","cd")
returns the integer 3 because cd is found beginning at
the third character of abcd.
- match(string,regexp)
- Determines if string contains a substring that matches
the regular expression (pattern) regexp. If so, the function
returns an index giving the position of the matching substring within string; if not, match returns 0. match also
sets a variable named RSTART to the index where the matching
string starts, and a variable named RLENGTH to the length
of the matching string.
- substr(string,pos)
- Returns the last part of string, beginning at a particular
character position. The argument pos is an integer, giving
the number of a character. Numbering begins at 1. For example, the
value of:
substr("abcd",3)
is
the string cd.
- substr(string,pos,length)
- Returns the part of string that begins at the character
position given by pos and has the length given by length.
For example, the value of:
substr("abcdefg",3,2)
is cd (a string of length 2 beginning at position
3).
- sprintf(format,value1,value2,...)
- Is based on the printf action. The value of sprintf is
the string that would be printed out by the action
printf(format,value1,value2,...)
For
example:
str = sprintf("%d %d!!!\n",2,3)
assigns the string "2 3!!!\n" to the string variable str.
- tolower(string)
- Returns the value of string, but with all the letters
in lowercase. (This function is an extension to standard awk.)
- toupper(string)
- Returns the value of string, but with all the letters
in uppercase. (This function is an extension to standard awk.)
- ord(string)
- Converts the first character of string into a number.
This number gives the decimal value of the character in the character
set used on the system. (This function is an extension to standard awk.)
|