Combining Text

As I wrote before, I started the second draft of Shapes on Monday.

I’ve been working in text files for the most part, with the original draft text in one file with labelled lines and the outline form of that text in another. That outline has been changed quite a lot, with lines re-ordered, added to, and generally modified to match the new vision of how this story works.

So I have two files, but I want one to work on the text of: one file with the original text and the outline content on one line.

As I do in this situation, I wrote a script to do this (included at the end). It’s called combine-outline and it imposes some requirements on the files –

  1. the text file can have labelled lines. The labels look like <label>:, but if they’re omitted then the script will use the line number.
  2. the outline defines the order of the combined content. An outline line consists of an optional label and a piece of outline text.

What the script does is to walk the outline, looking for labels. The label is used to extract the corresponding line from the text file, then it prints out the text and any outline string enclosed in square brackets.

For example, if we combine this text file –

1: There once was a frog.
2: Billy the Frog lived in a house made of mud and leaves.
3: The frog was named Billy.
4: Next door to Billy's house was an earwig nightclub.
5: Yellow wasps invaded Billy's dreams, like sharp-fingered thieves.
6: The nightclub was no trouble, though, because earwigs dance to smell.

… and this outline file –

1: introduce the frog.
3: give the frog's name first
2:
4: talk about the nightclub next door.
The earwigs danced through the day.
6: explain why the nightclub was no bother

… then this script call –

combine-outline text outline combination

… produces this combined output –

There once was a frog. [introduce the frog.]
The frog was named Billy. [give the frog's name first]
Billy the Frog lived in a house made of mud and leaves.
Next door to Billy's house was an earwig nightclub. [talk about the nightclub next door.]
[The earwigs danced through the day.]
The nightclub was no trouble, though, because earwigs dance to smell. [explain why the nightclub was no bother]

Tracking work on this needs word count for the actual text, and I get that on the command line thus –

cat second-draft-text | sed 's/\[[^]]*\]//' | wc

Anyway, back to making words.

Here is the script –

#!/usr/bin/ruby

# Combines content of outline and text files into a single output file.
# The outline file must have line numbers while the text need not, 
# Both the outline and text files are numbered. The outline numbers reference
# those of the text, ordering the text lines according to the line numbers.

help_text = <<END_HELP
Usage:
    combine-outline   
                --force

... where:
     is the original text whose lines are used as labels
     is the outline referencing those labels
     is the file where the combined output should be
        written. This will abort if the file already exists.
    --force will overwrite an existing 
END_HELP

show_help = false
source_file = nil
outline_file = nil
target_file = nil
force_target_write = false

# Read text and map lines by label
# Read outline.
# For each outline line:
# - if there is a label, find corresponding text line.
# - print outline line number, text in braces, and outline as annotation

ARGV.each do |arg|
  if (arg == "--force")
    force_target_write = true
  elsif arg == "-?" || arg == "-h" || arg == "--help"
    show_help = true
  elsif source_file.nil?
    source_file = arg
  elsif outline_file.nil?
    outline_file = arg
  elsif target_file.nil?
    target_file = arg
  else
    $stderr.puts "*** Unrecognised arg '#{arg}'"
    show_help = true
  end
end

# Validate
if !source_file.nil? && !File.exists?(source_file)
  puts "Cannot find text source file #{source_file}"
  show_help = true
end
if !outline_file.nil? && !File.exists?(outline_file)
  puts "Cannot find outline file #{outline_file}"
  show_help = true
end
if !target_file.nil? && File.exists?(target_file) && !force_target_write
  puts "Target text file #{target_file} is already present; aborting."
  show_help = true
end

if show_help
  puts help_text
  exit 1
end

# Read inputs
source_lines = File.read(source_file).split("\n")
outline_lines = File.read(outline_file).split("\n")

# Read text, and map text lines by label
text_by_label = {}
File.foreach(source_file).with_index do |line, line_num|
  if !(matches = line.scan(/^([^:]+:)?\s*(.*)$/)).empty?
    m = matches[0]
    label = m[0]
    text = m[1]
    if label.nil? || label.empty?
      label = (line_num+1).to_s
    end
    text_by_label[label] = text
  end
end

# Open target file
File.open(target_file, 'w') do |f|
  # Read outline
  # For each outline line:
  # - if there is a label, find corresponding text line.
  # - print outline line number, text in braces, and outline as annotation
  File.foreach(outline_file).with_index do |line, line_num|
    if !(matches = line.scan(/^([^:]+:)?\s*(.*)$/)).empty?
      m = matches[0]
      label = m[0]
      outline = m[1]
      text = if !label.nil? && !label.empty?
        text_by_label[label]
      end
      result = []
      result << text unless text.nil? || text.empty?
      result << "[#{outline}]" unless outline.nil? || outline.empty?
      f.puts result.join(" ")
    end
  end
end

Leave a Reply