Commit Graph

34 Commits (df7f3b91510e124f0f5145d98f638bd4e9dafbd6)

Author SHA1 Message Date
Erik Larsson f0370bfa9c unistr.c: Unify the two defines NOREVBOM and ALLOW_BROKEN_SURROGATES.
In the mailing list discussion we came to the conclusion that there
doesn't seem to be any reason to keep these declarations separate since
they address the same issue, namely libntfs-3g's tolerance for bad
Unicode data in filenames and other UTF-16 strings in the file system,
so merge the two defines into the new define ALLOW_BROKEN_UNICODE.
2016-04-12 17:02:40 +02:00
Erik Larsson d9c61dd60e unistr.c: Enable encoding broken UTF-16 into broken UTF-8, A.K.A. WTF-8.
Windows filenames may contain invalid UTF-16 sequences (specifically
broken surrogate pairs), which cannot be converted to UTF-8 if we do
strict conversion.

This patch enables encoding broken UTF-16 into similarly broken UTF-8 by
encoding any surrogate character that don't have a match into a separate
3-byte UTF-8 sequence.

This is "sort of" valid UTF-8, but not valid Unicode since the code
points used for surrogate pair encoding are not supposed to occur in a
valid Unicode string... but on the other hand the source UTF-16 data is
also broken, so we aren't really making things any worse.

This format is sometimes referred to as WTF-8 (Wobbly Translation
Format, 8-bit encoding) and is a common solution to represent broken
UTF-16 as UTF-8.

It is a lossless round-trip conversion, i.e converting from broken
UTF-16 to "WTF-8" and back to UTF-16 yields the same broken UTF-16
sequence. Because of this property it enables accessing these files
by filename through ntfs-3g and the ntfsprogs (e.g. ls -la works as
expected).

To disable this behaviour you can pass the preprocessor/compiler flag
'-DALLOW_BROKEN_SURROGATES=0' when building ntfs-3g.
2016-04-08 05:39:48 +02:00
Erik Larsson 9893ea9ee6 Merge endianness fixes.
Conflicts:
	libntfs-3g/attrib.c
2016-01-28 09:22:42 +01:00
Erik Larsson 9cf04fd2cd Fix incorrect usage of native/little-endian types, signed types, etc.
This is harmless with regard to code generation but if we turn on strict
type checking these type mismatches will result in errors.
2015-12-21 23:55:31 +01:00
Erik Larsson dfa4a6647f Fix code to use const_cpu_to_X/const_X_to_cpu macros for constants.
This enables the compiler to optimize this code in cases where compiler
support for endianness swapping is not present.
2015-12-21 23:21:00 +01:00
Erik Larsson c9771d0509 unistr.c: Cleanup of OS X Unicode normalization code.
Normalize coding conventions to fit in with the rest of NTFS-3G,
including line breaks at column 80.
2015-06-23 06:43:17 +02:00
Jean-Pierre André e40b86a86c Upgraded the upper-case table as defined by Windows 7
Newer versions of Windows use more recent definitions of upper-case
table defined by the Unicode consortium. Now using the same table as
Windows 7, windows 8 and Windows 10. This only has an effect on file
systems newly created by mkntfs.
2015-04-17 11:03:58 +02:00
Jean-Pierre André 543b17b7ef Rejected reserved files names when option windows_names is set
Windows applies legacy restrictions to file names, so when the option
windows_names is applied, reject the same reserved names, which are
CON, PRN, AUX, NUL, COM1..COM9, and LPT1..LPT9
2014-03-11 10:56:31 +01:00
Jean-Pierre André 4ce33daf6c Cosmetic : fixed an indentation in unistr 2012-01-23 17:09:19 +01:00
Jean-Pierre André fa3d7a5728 minor : Fixed ntfs_upcase_build_default() returning garbage in error case (Fabian Keil) 2011-08-04 15:49:35 +02:00
Jean-Pierre André 82b00364a8 Fixed setting DOS names when defined with lower-case chars 2011-07-05 12:17:11 +02:00
Jean-Pierre André a46a395006 Updated copyright notices 2011-02-08 13:52:12 +01:00
Jean-Pierre André 4c6cf9d977 Moved the knowledge of default upcase size to unistr.c 2011-02-08 13:52:12 +01:00
Jean-Pierre André 53599b1a98 Switched to the same Upcase table as Vista 2010-12-21 15:51:08 +01:00
Jean-Pierre André 8b910e9e80 Improved names comparing on big-endian computers 2010-10-26 08:59:51 +02:00
Jean-Pierre André 008d8c5df9 Fixed character translations when standard functions are not available 2010-08-28 13:59:43 +02:00
Jean-Pierre André 4d73c7c4f1 Fixed characters not allowed by Windows in names 2010-06-03 10:13:30 +02:00
Jean-Pierre André 693aa8780d enabled case insensitive file names in lowntfs-3g 2010-05-25 10:12:44 +02:00
jpandre 195945cdc0 Evaluated file names collations in a single parsing 2009-12-16 09:45:28 +00:00
jpandre 7a876eca36 Fixed possible memory leaks after char translation errors 2009-12-09 11:20:20 +00:00
jpandre e23481624f Improved UTF8<-->UTF16 translations 2009-12-09 11:19:27 +00:00
jpandre a75724fea8 Fixed a few misleading endianness types 2009-11-24 14:18:53 +00:00
jpandre 3af7bebe7b Mac OS X Unicode normalization form conversion (Erik Larsson) 2009-11-05 11:40:44 +00:00
jpandre e4b3c59cb1 Accepted initial spaces in Win32/DOS names 2009-09-18 16:17:21 +00:00
jpandre 1d26eb2b97 Fixed checking spaces in Win32 names 2009-08-12 15:35:11 +00:00
jpandre 9a4672ca65 Developped getting and setting DOS names (short 8+3 names) 2009-07-01 19:45:59 +00:00
jpandre fc78c03c39 Fixed an endianness error in default uppercase table 2009-04-20 15:27:03 +00:00
jpandre 11216c6942 Adapted to ntfs-3g-2009.1.1 2009-01-23 11:11:44 +00:00
jpandre d3f3a19866 Adapted to ntfs-3g.1.5222-RC 2009-01-05 13:28:06 +00:00
jpandre 13552eba52 Integrated full utf-8 to utf-16le conversions, based on code by Berhard Kaindl 2008-08-21 12:04:51 +00:00
szaka 1098244bbf copyright update 2008-06-29 23:13:32 +00:00
jpandre 53fa335624 Adapted to ntfs-3g.1.2310 2008-03-10 15:35:54 +00:00
jpandre 038156ba82 Reengineered LRU caches, made generic, and applied to finding inode numbers 2008-01-10 17:32:55 +00:00
szaka ba63b7daca initial CVS import 2006-10-30 22:32:48 +00:00