Character Processing Settings |
Top Previous Next |
Character processing settings control how Agent Ransack handles information found in files.
End of Line (EOL) Identifiers
Defines which other EOL identifiers Agent Ransack should use. Normally a Windows text file will use a CRLF (carriage return 0x0d, line feed 0x0a) combination to indicate the end of a line. However, other operating systems use different standards, usually either a stand alone CR or a stand alone LF character.
Maximum characters per line - sets the limit to the line length if an EOL character is not found. Lines that exceed the maximum line length are broken into separate lines, although the line number for the line stays the same.
Containing Text
The 'Include file name in content search' option is used to include the file name in the content search. For example, if a file named LondonHistory.doc was searched for Tower AND London the file would be matched if the word Tower appeared in document text because the word London is already in the file name. However, with the option switched off the file would only be matched if both words appeared in the document text, ie the documents file name would not be included as part of the text.
With the auto-detect UTF-8 feature active Agent Ransack will read the first 2KB of the file to see if any specific UTF-8 character sequences can be found. If they are found it reads the file as UTF-8 otherwise it defaults to reading the file as ASCII text.
|