String manipulation functions
awk has a number of functions that perform string operations:
- length
- Returns an integer that is the length of the current record
(that is, the number of characters in the record, without the newline
on the end). For example, the following program calculates the total
number of characters in a file (except for newline characters):
{ sum = sum + length } END { print sum }
- length(s)
- Returns an integer that is the length of the string s.
For example, the following program prints the length of the first
field in each record of the file:
The function call length($0) is equivalent to just length.{ print length($1) }
- gsub(regexp,replacement)
- Puts the replacement string replacement in place of
every string matching the regular expression regexp in the
current record. For example, the program:
checks every record in the data file for the regular expression John, replaces matching strings with Jonathan, and prints the resulting record. As a result, the program's output is exactly like its input, except that every occurrence of John is changed to Jonathan. This form of the gsub function returns an integer telling how many substitutions were made in the current record. This is 0 if the record has no strings that match regexp.{ gsub(/John/,"Jonathan") print }
- sub(regexp,replacement)
- Is similar to gsub, except that it replaces only the first occurrence of a string matching regexp in the current record.
- gsub(regexp,replacement,string_var)
- Puts the replacement string replacement in place of
every string matching the regular expression regexp in the
string string_var. For example, the program:
is similar to the previous program, but the replacement is made only in the first field of each record. This form of the gsub function returns an integer telling how many substitutions were made in string_var.{ gsub(/John/,"Jonathan",$1) print }
- sub(regexp,replacement,string_var)
- Is similar to the previous version of gsub, except
that it only replaces the first occurrence
of a string matching regexp in the string string_var.
Note: You must use four backslashes to embed one literal backslash in a gsub() or sub() substitution string. For example,
replaces all occurrences of the word backslash with the single character \.gsub(/backslash/,"\\\\")
- index(string,substring)
- Searches the given string for the appearance of the
given substring. If it cannot find substring, index returns 0; otherwise, index returns the number (origin
1) of the character in string where substring begins.
For example:
returns the integer 3 because cd is found beginning at the third character of abcd.index("abcd","cd")
- match(string,regexp)
- Determines if string contains a substring that matches the regular expression (pattern) regexp. If so, the function returns an index giving the position of the matching substring within string; if not, match returns 0. match also sets a variable named RSTART to the index where the matching string starts, and a variable named RLENGTH to the length of the matching string.
- substr(string,pos)
- Returns the last part of string, beginning at a particular
character position. The argument pos is an integer, giving
the number of a character. Numbering begins at 1. For example, the
value of:
is the string cd.substr("abcd",3)
- substr(string,pos,length)
- Returns the part of string that begins at the character
position given by pos and has the length given by length.
For example, the value of:
is cd (a string of length 2 beginning at position 3).substr("abcdefg",3,2)
- sprintf(format,value1,value2,...)
- Is based on the printf action. The value of sprintf is
the string that would be printed out by the action
For example:printf(format,value1,value2,...)
assigns the string "2 3!!!\n" to the string variable str.str = sprintf("%d %d!!!\n",2,3)
- tolower(string)
- Returns the value of string, but with all the letters in lowercase. (This function is an extension to standard awk.)
- toupper(string)
- Returns the value of string, but with all the letters in uppercase. (This function is an extension to standard awk.)
- ord(string)
- Converts the first character of string into a number. This number gives the decimal value of the character in the character set used on the system. (This function is an extension to standard awk.)