csplit — Split text files

Format

csplit [–Aaks] [–f prefix] [–n number] file arg arg …

Description

csplit takes a text file as input and breaks up its contents into pieces, based on criteria given by the arg value on the command line. For example, you can use csplit to break up a text file into chunks of ten lines each, then save each of those chunks in a separate file. See Splitting criteria for more information. If you specify as the file argument, csplit uses the standard input (stdin).

The files created by csplit normally have names of the form
xxnumber
where number is a 2-digit decimal number that begins at zero and increments by one for each new file that csplit creates.

csplit also displays the size, in bytes, of each file that it creates.

Options

–A
Uses uppercase letters in place of numbers in the number portion of created file names. This generates names of the form xxAA, xxAB, and so on.
–a
Uses lowercase letters in place of numbers in the number portion of created file names. This generates names of the form xxaa, xxab, and so on.
–f prefix
Specifies a prefix to use in place of the default xx when naming files. If it causes a file name longer than NAME_MAX bytes, an error occurs and csplit exits without creating any files.
–k
Leaves all created files intact. Normally, when an error occurs, csplit removes files that it has created.
–n number
Specifies the number of digits in the number portion of created file names.
–s
Suppresses the display of file sizes.

Splitting criteria

csplit processes the args on the command line sequentially. The first argument breaks off the first chunk of the file, the second argument breaks off the next chunk (beginning at the first line remaining in the file), and so on. Thus each chunk of the file begins with the first line remaining in the file and goes to the line given by the next arg.

arg values can take any of the following forms:
/regexp/
Takes the chunk as all the lines from the current line up to but not including the next line that contains a string matching the regular expression regexp. After csplit obtains the chunk and writes it to an output file, it sets the current line to the line that matched regexp.
/regexp/offset
Is the same as the previous criterion, except that the chunk goes up to but not including the line that is a given offset from the first line containing a string that matches regexp. The offset can be a positive or negative integer. After csplit has obtained the chunk and written it to an output file, it sets the current line to the line that matched regexp.
Note: This current line is the first one that was not part of the chunk just written out.
%regexp%
Is the same as /regexp/, except that csplit does not write the chunk to an output file. It simply skips over the chunk.
%regexp%offset
Is the same as /regexp/offset, except csplit does not write the chunk to an output file.
linenumber
Obtains a chunk beginning at the current line and going up to but not including the linenumberth line. After split writes the chunk to an output file, it sets the current line to linenumber.
{number}
Repeats the previous criterion number times. If it follows a regular expression criterion, it repeats the regular expression process number more times. If it follows a linenumber criterion, csplit splits the file every linenumber lines, number times, beginning at the current line. For example,
csplit file 10 {10}
obtains a chunk from line 1 to line 9, then every 10 lines after that, up to line 109.

Errors occur if any criterion tries to "grab" lines beyond the end of the file, if a regular expression does not match any line between the current line and the end of the file, or if an offset refers to a position before the current line or past the end of the file.

Localization

csplit uses the following localization variables:
  • LANG
  • LC_ALL
  • LC_COLLATE
  • LC_CTYPE
  • LC_MESSAGES
  • LC_SYNTAX
  • NLSPATH

See Localization for more information.

Exit values

0
Successful completion
1
Failure due to any of the following:
  • csplit could not open the input or output files
  • A write error on the output file
2
Failure due to any of the following:
  • Unknown command-line option
  • The prefix name was missing after –f
  • The number of digits was missing after –n
  • The input file was not specified
  • No arg values were specified
  • The command ran out of memory
  • An arg was incorrect
  • The command found end-of-file before it was expected
  • A regular expression in an arg was badly formed
  • A line offset/number in an arg was badly formed
  • A {number} repetition count was misplaced or badly formed
  • Too many file names were generated when using –n
  • Generated file names would be too long

Portability

POSIX.2 User Portability Extension, X/Open Portability Guide, UNIX systems.

The –A and –a options are extensions to the POSIX standard.

Related information

awk, sed

For more information about regexp, see Regular expressions (regexp).