NAME

cdif - word context diff

VERSION

Version 4.44

SYNOPSIS

cdif [option] file1 file2

cdif [option] [diff-data]

Options:

    -c, -Cn         context diff
    -u, -Un         unified diff
    -i              ignore case
    -b              ignore space change
    -w              ignore whitespace
    -t              expand tabs

    --diff=command      specify diff command
    --subdiff=command   specify backend diff command
    --stat              show statistical information
    --colormap=s        specify color map
    --sdif              sdif friendly option
    --[no]color         color or not            (default true)
    --[no]256           ANSI 256 color mode     (default true)
    --[no]cc            color command line      (default true)
    --[no]mc            color diff mark         (default true)
    --[no]tc            color normal text       (default true)
    --[no]uc            color unknown text      (default true)
    --[no]old           print old text          (default true)
    --[no]new           print new text          (default true)
    --[no]mrg           print merged text       (default true)
    --[no]command       print diff command line (default true)
    --[no]unknown       print unknown line      (default true)
    --[no]mark          print mark or not       (default true)
    --[no]prefix        read git --graph output (default true)
    --unit=s            word/letter/char/mecab  (default word)
    --[no]mecab         use mecab tokenizer     (default false)
    --prefix-pattern    prefix pattern
    --visible char=?    set visible attributes
    --[no]lenience      suppress unexpected input warning (default true)
    --lxl               compare input data line-by-line
    --style=diff        print --lxl output in diff style
    --version           show version

DESCRIPTION

cdif is a post-processor of the Unix diff command. It highlights deleted, changed and added words based on word context (–unit=word by default). If you want to compare text character-by-character, use option –unit=char. Option –unit=mecab tells to use external mecab command as a tokenizer for Japanese text.

If single or no file is specified, cdif reads that file or STDIN as an output from diff command. In addition to normal diff, context diff, and unified (combined) diff, the git(1)-compatible conflict marker format is supported as input format.

Lines that don’t look like diff output are simply ignored and printed.

STARTUP and MODULE

cdif utilizes Perl Getopt::EX module, and reads ~/.cdifrc file if available when starting up. You can define original and default option there. The next line enables –mecab option and adds crossed-out effect for deleted words.

option default --mecab --cm DELETE=+X

Modules under App::cdif can be loaded by -M option without prefix. The next command loads App::cdif::colors module.

$ cdif -Mcolors

You can also define options in module file. Read `perldoc Getopt::EX::Module` for detail.

COLOR

Each line is displayed in different colors. Each text segment has own labels, and color for them can be specified by –colormap option. Read `perldoc Getopt::EX::Colormap` for detail.

Standard module -Mcolors is loaded by default, and defines several color maps for light and dark screen. If you want to use CMY colors in dark screen, place next line in your ~/.cdifrc.

option default --dark-cmy

Option –autocolor is defined in default module to call Getopt::EX::termcolor module. It sets –light or –dark option according to the brightness of the terminal screen. You can set preferred color in your ~/.cdifrc like:

option --light --cmy
option --dark  --dark-cmy

Automatic setting is done by Getopt::EX::termcolor module and it works with macOS Terminal.app and iTerm.app, and other XTerm compatible terminals. This module accepts environment variable TERM_BGCOLOR as a terminal background color. For example, use 000 or #000000 for black and 555 or #FFFFFF for white.

Option –autocolor is set by default, so override it to do nothing to disable.

option --autocolor --nop

EXIT STATUS

cdif always exits with status zero unless an error occurs.

OPTIONS

  • -[cCuUibwtT]

    Almost same as diff command.

  • unit=[word,letter,char,mecab,0,``]
  • by=[word,letter,char,mecab,0,``]

    Specify the comparing unit. Default is word and compare each line word-by-word. Specify char if you want to compare them character-by-character. Unit letter is almost same as word but does not include underscore.

    When mecab is given as an unit, mecab command is called as a tokenizer for non-ASCII text. ASCII text is compared word-by-word. External mecab command has to be installed.

    If you give empty string like --unit=, or 0, cdif does not compare text in any way. You’ll still get colorization effect.

  • –mecab

    Shortcut for –unit=mecab.

  • –diff=command

    Specify the diff command to use.

  • –subdiff=command

    Specify the backend diff command to get word differences. Accept normal and unified diff format.

    If you want to use git diff command, don’t forget to set -U0 option.

      --subdiff="git diff -U0 --no-index --histogram"
    
  • [no-]color

    Use ANSI color escape sequence for output.

  • –colormap=colormap, –cm=colormap

    Basic colormap format is :

      FIELD=COLOR
    

    where the FIELD is one from these :

      COMMAND  Command line
      OMARK    Old mark
      NMARK    New mark
      UTEXT    Same text
      OTEXT    Old text
      NTEXT    New text
      OCHANGE  Old change part
      NCHANGE  New change part
      APPEND   Appended part
      DELETE   Deleted part
    

    and additional Common and Merged FIELDs for git-diff combined format.

      CMARK    Common mark
      CTEXT    Common text
      MMARK    Merged mark
      MTEXT    Merged text
    

    You can make multiple fields same color joining them by = :

      FIELD1=FIELD2=...=COLOR
    

    Also wildcard can be used for field name :

      *CHANGE=BDw
    

    Multiple fields can be specified by repeating options

      --cm FIELD1=COLOR1 --cm FIELD2=COLOR2 ...
    

    or combined with comma (,) :

      --cm FIELD1=COLOR1,FIELD2=COLOR2, ...
    

    Color specification is a combination of single uppercase character representing 8 colors :

      R  Red
      G  Green
      B  Blue
      C  Cyan
      M  Magenta
      Y  Yellow
      K  Black
      W  White
    

    and alternative (usually brighter) colors in lowercase :

      r, g, b, c, m, y, k, w
    

    or RGB values and 24 grey levels if using ANSI 256 or full color terminal :

      (255,255,255)      : 24bit decimal RGB colors
      #000000 .. #FFFFFF : 24bit hex RGB colors
      #000    .. #FFF    : 12bit hex RGB 4096 colors
      000 .. 555         : 6x6x6 RGB 216 colors
      L00 .. L25         : Black (L00), 24 grey levels, White (L25)
    

    or color names enclosed by angle bracket :

      <red> <blue> <green> <cyan> <magenta> <yellow>
      <aliceblue> <honeydue> <hotpink> <moccasin>
      <medium_aqua_marine>
    

    with other special effects :

      D  Double-struck (boldface)
      I  Italic
      U  Underline
      S  Stand-out (reverse video)
    

    Above color spec is simplified summary so if you want complete information, read Getopt::EX::Colormap.

    Defaults are :

      COMMAND => "555/222E"
      OMARK   => "CS"
      NMARK   => "MS"
      UTEXT   => ""
      OTEXT   => "C"
      NTEXT   => "M"
      OCHANGE => "K/445"
      NCHANGE => "K/445"
      DELETE  => "K/544"
      APPEND  => "K/544"
    
      CMARK   => "GS"
      MMARK   => "YS"
      CTEXT   => "G"
      MTEXT   => "Y"
    

    This is equivalent to :

      cdif --cm 'COMMAND=555/222E,OMARK=CS,NMARK=MS' \
           --cm 'UTEXT=,OTEXT=C,NTEXT=M,*CHANGE=BD/445,DELETE=APPEND=RD/544' \
           --cm 'CMARK=GS,MMARK=YS,CTEXT=G,MTEXT=Y'
    
  • –colormap=&func
  • –colormap=sub{...}

    You can also set the name of perl subroutine name or definition to be called handling matched words. Target word is passed as variable $_, and the return value of the subroutine will be displayed.

    Next option produces wdiff-like formatted output.

      --cm '*'= \
      --cm DELETE=OCHANGE='sub{"[-$_-]"}' \
      --cm APPEND=NCHANGE='sub{"{+$_+}"}'
    

    See “FUNCTION SPEC” in Getopt::EX::Colormap for detail.

  • [no-]cc, [no-]commandcolor
  • [no-]mc, [no-]markcolor
  • [no-]tc, [no-]textcolor
  • [no-]uc, [no-]unknowncolor

    Enable/Disable using color for the corresponding field.

  • –sdif

    Disable options appropriate to use for sdif’s input: –commandcolor, –markcolor, –textcolor and –unknowncolor.

  • [no-]old, [no-]new, [no-]mrg

    Print or not old/new/mrg text in diff output.

  • [no-]command

    Print or not command lines preceding diff output.

  • [no-]unknown

    Print or not lines that do not look like diff output.

  • [no-]mark

    Print or not marks at the top of diff output lines. At this point, this option is effective only for unified diff.

    The next example produces output exactly the same as new except for visual effects.

      cdif -U100 --no-mark --no-old --no-command --no-unknown old new
    

    These options are prepared for watchdiff(1) command.

  • [no-]prefix

    Understand prefix for diff output including git –graph option. True by default.

  • –prefix-pattern=pattern

    Specify prefix pattern in regex. Default pattern is:

      (?:\| )*(?:  )*
    

    This pattern matches git graph style and whitespace indented diff output.

  • –visible charname=[0,1]

    Set visible attribute for specified characters. Visible character is converted to corresponding Unicode symbol character. Default visible: nul, bel, bs, vt, np, cr, esc, del, and all non-breaking/special spaces. Default invisible: ht, nl, sp.

      NAME    CODE    Unicode NAME                      DEFAULT
      ------  ------  --------------------------------  -------
      nul     \000    SYMBOL FOR NULL                   YES
      soh     \001    SYMBOL FOR SOH                    YES
      bel     \007    SYMBOL FOR BELL                   YES
      bs      \010    SYMBOL FOR BACKSPACE              YES
      ht      \011    SYMBOL FOR HORIZONTAL TABULATION  NO
      nl      \012    SYMBOL FOR NEWLINE                NO
      vt      \013    SYMBOL FOR VERTICAL TABULATION    YES
      np      \014    SYMBOL FOR FORM FEED              YES
      cr      \015    SYMBOL FOR CARRIAGE RETURN        YES
      esc     \033    SYMBOL FOR ESCAPE                 YES
      sp      \040    SYMBOL FOR SPACE                  NO
      del     \177    SYMBOL FOR DELETE                 YES
      nbsp    \240    OPEN BOX (No-Break Space)         YES
      nnbsp   U+202F  OPEN BOX (Narrow No-Break Space)  YES
      ensp    U+2002  OPEN BOX (En Space)               YES
      emsp    U+2003  OPEN BOX (Em Space)               YES
      thinsp  U+2009  OPEN BOX (Thin Space)             YES
      hairsp  U+200A  OPEN BOX (Hair Space)             YES
      zwsp    U+200B  OPEN BOX (Zero Width Space)       YES
      idesp   U+3000  OPEN BOX (Ideographic Space)      YES
    

    Since there are no dedicated Unicode symbols for nbsp and other special space characters, OPEN BOX is used instead.

    Multiple characters can be specified at once, by assembling them by comma (,) like --visible ht=1,sp=1; or connecting them by equal sign (=) like --visible ht=sp=1. Character names accept wildcard; --visible '*=1'.

    sdif command also supports –visible option for horizontal tab with better visibility.

  • –stat

    Print statistical information at the end of output. It shows number of total appended/deleted/changed words in the context of cdif. It’s common to have many insertions and deletions of newlines because of text filling process. So normal information is followed by modified number which ignores insert/delete newlines.

  • [no-]lenience

    Suppress warning message for unexpected input from diff command. True by default.

  • –linebyline, –lxl
  • –style=style

    Compare input data line-by-line. Consider the inputs as pairs of two lines each, and output the result of comparing each two lines.

    Suppose you have a document with old and new text on lines beginning with OLD: and NEW: labels.

      OLD: this is old text
      NEW: and this is updated document
    

    Only this old/new part can be compared using greple’s -Mtee module as follows.

      greple -Mtee cdif --lxl -- --cm=N -GE '^OLD: (.*\n)^NEW: (.*\n)'
    

    -Mtee module sends matched parts to the filter command and replace them by its result. Consult App::Greple::tee for detail.

    You can use teip(1) command as well.

      teip -g '^(OLD|NEW):' -- cdif --lxl
    

    If the --style=diff option is given, the two strings are output in diff format and can be used in combination with the sdif command as follows.

      cdif --lxl --style=diff ... | sdif --no-cdif
    

    Currently the only valid value for --style is diff, which affects only the behavior of --lxl option.

GIT

See `perldoc App::sdif` how to use related commands under the GIT environment.

ENVIRONMENT

  • CDIFOPTS

    Environment variable CDIFOPTS is used to set default options.

  • LESS
  • LESSANSIENDCHARS

    Since cdif produces ANSI Erase Line terminal sequence, it is convenient to set less command understand them.

      LESS=-cR
      LESSANSIENDCHARS=mK
    

AUTHOR

Kazumasa Utashiro

LICENSE

Copyright 1992-2026 Kazumasa Utashiro

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

App::sdif, https://github.com/kaz-utashiro/sdif-tools

sdif(1), watchdiff(1)

Getopt::EX::Colormap

App::Greple::tee

https://taku910.github.io/mecab/

BUGS

cdif is naturally not very fast because it uses normal diff command as a back-end processor to compare words.