Jump back to the top page Jump back to the top page
 

 Home
 What is XXCOPY ?
 Products
 Downloads
 F A Q 
 Order
 Support
 About Us (Pixelab) 
On-Line Manual 
XXTB #001 
XXTB #002 
XXTB #003 
XXTB #004 
XXTB #005 
             . . .  


XXCOPY
Version
3.33.3


Released
2016-10-28
©2016 Copyright
Pixelab

 All rights reserved  
    [ Table of Contents ] [ Show as Detached ] [ >> ]

    XXCOPY TECHNICAL BULLETIN #049

    From:    Kan Yabumoto           tech@xxcopy.com
    To:      XXCOPY user
    Subject: Unicode Support in XXCOPY
    Date:    2009-12-23
    ===============================================================================
    
    Background:
    
        Starting with Version 2.97.0,  XXCOPY added the support of Unicode (16-bit
        characters).  Earlier versions of XXCOPY were confined to process only
        8-bit characters in practically all aspects of its operations.  Being a
        console application program, XXCOPY is still confined to receive the
        command line (keyboard) input in 8-bit characters and its console output
        also in 8-bit strings.
        
          --------------------------------------------------------------
           Even though the code page of the CMD.EXE console can be set
           to UTF-8 (code page:65001), the console is unable to process
           the full range of Unicode characters.
          ---------------------------------------------------------------
    
        Thus, earlier versions of XXCOPY had to rely on the code page of the
        console that determined the character encoding within the console
        environment which was often inadequte to access some of the files and
        directories present in a disk volume.
    
        With the support of Unicode, XXCOPY can process any file and directory
        name since all pathnames are internally represented by a Unicode string.
    
    
    Input to XXCOPY using Unicode strings:
    
        Even with the Unicode support, XXCOPY will continue to operate in the
        console (CMD.EXE) window whose input and output streams are in 8-bit
        characters.
    
        There are times when you need to specify a directory or path name
        that contains Unicode characters that are not mapped in the current
        code page.  In such a case, the only way to specify such Unicode
        strings is to use an external file with the /CF (for command file)
        and /EX (for exclusion list).  These files can be either in the
        traditional ANSI text format, or in the UTF-8 encoded Unicode text
        format with the presence of the Byte-Order Mark (BOM) at the beginning
        of the file.  Note that there is no explicit switches that control
        the type of input string to XXCOPY.  The UTF-8 BOM sequence at the
        beginning of the file implicitly tells XXCOPY that the file is
        formatted in UTF-8.
    
        You may create such a file using the ubiquitous NotePad utility and
        Save-As command with the selection of UTF-8 encoding.  Most text
        editors available today should provide a user option to create a file
        in the UTF-8 format with the BOM header byte sequence.
    
    
    Output text by XXCOPY that contains Unicode strings:
    
        Another problem associated with Unicode text is XXCOPY's output.
        Since the console output by XXCOPY will be in 8-bit character stream,
        some of the characters may be displayed with a question mark (?) as
        a spaceholder.  This is a limitation of the console display.
    
        On the other hand, you should specify the /UT switch if you anticipate
        Unicode characters that cannot be represented by an 8-bit character.
        With the /UT switch, all XXCOPY output files (for /oA, /oN and /Fo)
        will be encoded in the UTF-8 format.  The default (/UT0) output files
        are made in Windows ANSI (8-bit) encodging.
    
    
    The Special Dialog Window for User Prompts:
    
        From time to time, XXCOPY halts its operation with a user prompt,
        for example for the confirmation of a file overwrite (/Po).  Since
        the console display is usually limited in in 8-bit character string,
        the filename displayed on the console may not be recognizable.
        
        With the /PW switch, XXCOPY will pop up a dialog window that displays
        the full pathname in Unicode even when the display on the console
        window fails to show the proper characters.
    
    
    References:
    
        Unicode      http://en.wikipedia.org/wiki/Unicode
        UTF-8        http://en.wikipedia.org/wiki/UTF-8
        Code page    http://en.wikipedia.org/wiki/Code_page
    
    
    
    
    [ Table of Contents ] [ Show as Detached ] [ >> ]