Text::Reflow - Perl module for reflowing text files using Knuth's paragraphing algorithm.


Text::Reflow - Perl module for reflowing text files using Knuth's paragraphing algorithm.


        use Text::Reflow qw(reflow_file reflow_string reflow_array);
        reflow_file($infile, $outfile, key => value, ...);
        $output = reflow_string($input, key => value, ...);
        $output = reflow_array(\@input, key => value, ...);


These routines will reflow the paragraphs in the given file, filehandle, string or array using Knuth's paragraphing algorithm (as used in TeX) to pick ``good'' places to break the lines.

Each routine takes ascii text data with paragraphs separated by blank lines and reflows the paragraphs. If two or more lines in a row are ``indented'' then they are assumed to be a quoted poem and are passed through unchanged (but see below)

The reflow algorithm tries to keep the lines the same length but also tries to break at punctuation, and avoid breaking within a proper name or after certain connectives (``a'', ``the'', etc.). The result is a file with a more ``ragged'' right margin than is produced by fmt or Text::Wrap but it is easier to read since fewer phrases are broken across line breaks.

For reflow_file, if $infile is the empty string, then the input is taken from STDIN and if $outfile is the empty string, the output is written to STDOUT. Otherwise, $infile and $outfile may be a string, a FileHandle reference or a FileHandle glob.

A typical invocation is:

        reflow_file("myfile", "");

which reflows the whole of myfile and prints the result to STDOUT.


The behaviour of Reflow can be adjusted by setting various keyword options. These can be set globally by referencing the appropriate variable in the Text::Reflow package, for example:

        $Text::Reflow::maximum = 80;
        $Text::Reflow::optimum = 75;

will set the maximum line length to 80 characters and the optimum line length to 75 characters for all subsequent reflow operations. Or they can be passed to a reflow_ function as a keyword parameter, for example:

        $out = reflow_string($in, maximum => 80, optimum => 75);

in which case the new options only apply to this call.

The following options are currently implemented, with their default values:

optimum => [65]
The optimum line length in characters. This can be either a number or a reference to an array of numbers: in the latter case, each optimal line length is tried in turn for each paragraph, and the one which leads to the best overall paragraph is chosen. This results in less ragged paragraphs, but some paragraphs will be wider or narrower overall than others.

maximum => 75
The maximum allowed line length.

indent => ``''
Each line of output has this string prepended. indent => string is equivalent to indent1 => string, indent2 => string.

indent1 => ``''
A string which is used to indent the first line in any paragraph.

indent2 => ``''
A string which is used to indent the second and subsequent line in any paragraph.

quote => ``''
Characters to strip from the beginning of a line before processing. To reflow a quoted email message and then restore the quotes you might want to use
        quote => "> ", indent => "> "

skipto => ``''
Skip to the first line starting with the given pattern before starting to reflow. This is useful for skipping Project Gutenberg headers or contents tables.

skipindented => 2
If skipindented = 0 then all indented lines are flowed in with the surrounding paragraph. If skipindented = 1 then any indented line will not be reflowed. If skipindented = 2 then any two or more adjacent indented lines will not be reflowed. The purpose of the default value is to allow poetry to pass through unchanged, but not to allow a paragraph indentation from preventing the first line of the paragraph from being reflowed.

noreflow => ``''
A pattern to indicate that certain lines should not be reflowed. For example, a table of contents might have a line of dots. The option:
        noreflow => '(\.\s*){4}\.'

will not reflow any lines containing five or more consecutive dots.

frenchspacing => 'n'
Normally two spaces are put at the end of a sentance or a clause. The frenchspacing option (taken from the TeX macro of the same name) disables this feature.

oneparagraph => 'n'
Set this to 'y' if you want the whole input to be flowed into a single paragraph, ignoring blank lines in the input.

semantic => 30
This parameter indicates the extent to which semantic factors matter (breaking on punctuation, avoiding a break within a clause etc.). Set this to zero to minimise the raggedness of the right margin, at the expense of readability.

namebreak => 10
Penalty for splitting up a name

sentence => 20
Penalty for sentence widows and orphans (ie splitting a line immediately after the first word in a sentence, or before the last word in a sentence)

independent => 10
Penalty for independent clause widows and orphans.

dependent => 6
Penalty for dependent clause widows and orphans.

shortlast => 5
Penalty for a short last line in a paragraph (one or two words).

connpenalty => 1
Multiplier for the ``negative penalty'' for breaking at a connective. In other words, increasing this value makes connectives an even more attractive place to break a line.


None by default.


Original reflow perl script written by Michael Larsen, larsen@edu.upenn.math.

Modified, enhanced and converted to a perl module with XSUB by Martin Ward, Martin.Ward@durham.ac.uk



See ``TeX the Program'' by Donald Knuth for a description of the algorithm used.

 Text::Reflow - Perl module for reflowing text files using Knuth's paragraphing algorithm.