中文编码杂谈

开发 后端 前端
ÔÚwindows×Ô´øµÄnotepad£¨¼Çʱ¾£©³ÌÐòÖÐÊäÈë¡°ÁªÍ¨¡±Á½¸ö×Ö£¬±£´æºóÔٴδò¿ª£¬»á·¢ÏÖ¡°ÁªÍ¨¡±²»¼ûÁË£¬´úÖ®ÒÔ¡°ͨ¡±µÄÂÒÂë¡£ÕâÊÇ windowsƽ̨ÉϵäÐ͵ÄÖÐÎıàÂëÎÊÌâ¡£¼´Îļþ±£´æµÄʱºòÊÇ°´ÕÕANSI±àÂ루Æäʵ¾ÍÊÇGB2312£¬ºóÃæ»áÏêϸ½éÉÜ£©±£´æ£¬´ò¿ªµÄʱºò³ÌÐò°´ÕÕ UTF-8·½Ê½¶ÔÄÚÈݽâÊÍ£¬ÓÚÊǾͳöÏÖÁËÂÒÂë¡£±ÜÃâÂÒÂëµÄ·½Ê½ºÜ¼òµ¥£¬ÔÚ¡°Îļþ¡±²Ëµ¥ÖÐÑ¡Ôñ¡°´ò¿ª¡±ÃüÁѡÔñ±£´æµÄÎļþ£¬È»ºóÑ¡Ôñ¡°ANSI¡±±àÂ룬´Ë ʱ¾ÍÄÜ¿´µ½¾ÃÎ¥µÄ¡°ÁªÍ¨¡±Á½¸ö×ÖÁË¡£

±àÂëÎÊÌâµÄÀý×Ó

ÔÚwindows×Ô´øµÄnotepad£¨¼Çʱ¾£©³ÌÐòÖÐÊäÈë“ÁªÍ¨”Á½¸ö×Ö£¬±£´æºóÔٴδò¿ª£¬»á·¢ÏÖ“ÁªÍ¨”²»¼ûÁË£¬´úÖ®ÒÔ“ͨ”µÄÂÒÂë¡£ÕâÊÇ windowsƽ̨ÉϵäÐ͵ÄÖÐÎıàÂëÎÊÌâ¡£¼´Îļþ±£´æµÄʱºòÊÇ°´ÕÕANSI±àÂ루Æäʵ¾ÍÊÇGB2312£¬ºóÃæ»áÏêϸ½éÉÜ£©±£´æ£¬´ò¿ªµÄʱºò³ÌÐò°´ÕÕ UTF-8·½Ê½¶ÔÄÚÈݽâÊÍ£¬ÓÚÊǾͳöÏÖÁËÂÒÂë¡£±ÜÃâÂÒÂëµÄ·½Ê½ºÜ¼òµ¥£¬ÔÚ“Îļþ”²Ëµ¥ÖÐÑ¡Ôñ“´ò¿ª”ÃüÁѡÔñ±£´æµÄÎļþ£¬È»ºóÑ¡Ôñ“ANSI”±àÂ룬´Ë ʱ¾ÍÄÜ¿´µ½¾ÃÎ¥µÄ“ÁªÍ¨”Á½¸ö×ÖÁË¡£

ÔÚLinuxƽ̨ÉÏÈç¹ûʹÓÃcatµÈÃüÁî²é¿´ÎļþÖеÄÖÐÎÄÄÚÈÝʱ£¬¿ÉÄܳöÏÖÂÒÂë¡£ÕâÒ²ÊDZàÂëµÄÎÊÌâ¡£¼òµ¥µÄ˵ÊÇÎļþʱ°´ÕÕA±àÂë±£´æ£¬µ«ÊÇcatÃüÁî°´ÕÕµ±Ç°LocaleÉ趨µÄB±àÂëÈ¥²é¿´£¬ÔÚBºÍA²»¼æÈݵÄʱºò¾Í³öÏÖÁËÂÒÂë¡£

ΪʲôдÕâƪÎÄÕÂ

ÖÐÎıàÂëÓÉÓÚÀúÊ·Ô­ÒòÇ£³¶µ½²»ÉÙ±ê×¼£¬ÔÚ²»Á˽âµÄʱºò¸Ð¾õһͷÎíË®£»µ«ÆäʵÀí½â±àÂëÎÊÌâ²¢²»ÐèÒªÄãÉîÈëÁ˽â¸÷¸ö±àÂë±ê×¼£¬Ö»ÒªÄãÃ÷°×ÁËÀ´ÁúÈ¥Âö£¬ÁË ½âÁ˹ؼüµÄ֪ʶµã£¬¾ÍÄÜ·ÖÎöºÍ½â¾öÈÕ³£¿ª·¢¹¤×÷ÖÐÅöµ½µÄ´ó²¿·Ö±àÂëÎÊÌâ¡£ÓиÐÓÚÎÒ¿´¹ýµÄ×ÊÁϺÍÎÄÕÂҪô²»¹»È«Ã棬ҪôÂÔÏÔ¿ÝÔËùÒÔͨ¹ýÕâƪÎÄռǼÏÂ±Ê ÕßÔÚÈÕ³£¹¤×÷ÖÐÅöµ½µÄÖÐÎıàÂëÔ­ÀíÏà¹ØÎÊÌ⣬ĿµÄÖ÷ÒªÊÇ×ÔÎÒ×ܽᣬÈç¹ûÄܸø¶ÁÕßÌṩһЩ°ïÖúÄǾÍËãÊÇÒâÍâ֮ϲÁË¡£ÓÉÓÚÑϽ÷µÄ±àÂë±ê×¼¶ÔÎÒÀ´ËµÊÇÎÞȤµÄ£¬¿Ý ÔïµÄ£¬ÄÑÒÔ¼ÇÒäµÄ£¬±¾Îij¢ÊÔÓÃdzÏÔÒ׶®µÄÉú»îÓïÑÔ½âÊÍÖÐÎıàÂëÏà¹ØµÄ£¨Ò²¿ÉÄܲ»Ïà¹ØµÄ£©Ò»Ð©ÎÊÌ⣬ÕâÒ²ÊÇΪʲôȡÃûÔÓ̸µÄÔ­Òò¡£±¾ÎĿ϶¨´æÔÚ²»¹æ·¶²»È«Ãæ µÄµØ·½£¬ÎÒ»áÔڲο¼×ÊÁÏÀï¸ø³ö¹Ù·½ÎĵµµÄÁ´½Ó£¬Ò²»¶Ó­¶ÁÕßÔÚÆÀÂÛÖÐÌá³ö¸üºÃµÄ±í´ï·½Ê½&Ö¸³ö´íÎ󣬲»Ê¤¸Ð¼¤¡£

¶Ô±àÂëÎÊÌâµÄÀí½âÎÒÈÏΪ·ÖΪÈý¸ö²ã´Î£¬µÚÒ»¸ö²ã´Î£º¸ÅÄ֪µÀ¸÷¸ö±àÂë±ê×¼µÄÓ¦Óó¡¾°£¬Á˽âÖ®¼äµÄ²îÒ죬ÄÜ·ÖÎöºÍ½â¾ö³£¼ûµÄһЩ±àÂëÎÊÌâ¡£µÚ¶þ¸ö²ã ´Î£º±ê×¼£¬ÕÆÎÕ±àÂëµÄϸ½Ú£¬Èç±àÂ뷶Χ£¬±àÂëת»»¹æÔò£¬ÖªµÀÕâЩ¾ÍÄÜ×ÔÐпª·¢±àÂëת»»¹¤¾ß¡£µÚÈý¸ö²ã´Î£¬Ê¹Óã¬Á˽âÖÐÎĵıàÂë2½øÖÆ´æ´¢£¬ÔÚ³ÌÐò¿ª·¢¹ý³Ì ÖÐÑ¡ÔñºÏÀíµÄ±àÂë²¢´¦ÀíÖÐÎÄ¡£ÎªÁ˱ÜÃâÈöÁÕßÏÝÈë±àÂë±ê×¼µÄºÚ¶´ÎÞ·¨ÍÑÉí£¨²»ÏàÐÅ£¿¿´¿´unicodeµÄ¹æ·¶¾ÍÃ÷°×ÎÒµÄÒâ˼ÁË£©£¬Í¬Ê±ÓÉÓÚ±àÂë²é ѯ&ת»»¹¤¾ßµÈ¶¼ÓÐÏֳɹ¤¾ß¿ÉÒÔʹÓ㬱¾ÎÄÖ»Éæ¼°µÚÒ»¸ö²ã´Î£¬²»Éæ¼°µÚ¶þ²ã´Î£¬ÔÚµÚÈý²ã´ÎÉÏ»á×öһЩ³¢ÊÔ¡£ÔÚ±¾ÎĵÄ×îºóÌṩÁËÏà¹ØÁ´½Ó¹©¶Ô±ê׼ϸ ½Ú¸ÐÐËȤµÄͬѧ¼ÌÐøѧϰ¡£×îºó£¬±¾ÎIJ»Éæ¼°¾ßÌåÈí¼þµÄÂÒÂëÎÊÌâ½â¾ö£¬Èçssh£¬shell£¬vim£¬screenµÈ£¬ÕâЩ»°ÌâÁô¸ø½£ºÀͬѧרÎIJûÊö¡£

Ò»Çж¼ÊÇÒòΪµçÄÔ²»Ê¶×Ö

µçÄԺܴÏÃ÷£¬¿ÉÒÔ°ïÎÒÃÇ×öºÜ¶àÊÂÇ飬×ʼÖ÷ÒªÊÇ¿Æѧ¼ÆË㣬ÕâÒ²ÊÇΪʲôµçÄÔ±ðÃû¼ÆËã»ú¡£µçÄÔÓֺܱ¿£¬ÔÚËýµÄÄÔ×ÓÀïÖ»ÓÐÊý×Ö£¬¼´ËùÓеÄÊý¾ÝÔÚ´æ´¢ºÍ ÔËËãʱ¶¼ÒªÊ¹Óöþ½øÖÆÊý±íʾ¡£ÕâÔÚ×î³õµçÄÔÖ÷ÒªÓÃÀ´´¦Àí´óÁ¿¸´ÔӵĿÆѧ¼ÆËãʱ²»ÊÇʲô´óÎÊÌ⵫Êǵ±µçÄÔÖð²½×ßÈëÆÕͨÈ˵ÄÉú»îʱ£¬Çé¿ö¿ªÊ¼±äÔâÁË¡£°ì¹«×Ô¶¯ »¯µÈÁìÓò×îÖ÷ÒªµÄÐèÇó¾ÍÊÇÎÄ×Ö´¦Àí£¬µçÄÔÈçºÎÀ´±íʾÎÄ×ÖÄØ£¿Õâ¸öÎÊÌ⵱ȻÄѲ»µ¹´ÏÃ÷µÄ¼ÆËã»ú¿Æѧ¼ÒÃÇ£¬ÓÃÊý×ÖÀ´´ú±í×Ö·ûß¡£Õâ¾ÍÊÇ“±àÂ딡£

Ó¢ÎĵÄÖÕ¼«½â¾ö·½°¸£ºASCII

ÿ¸öÈ˶¼¿ÉÒÔÔ¼¶¨×Ô¼ºµÄÒ»Ì×±àÂ룬ֻҪʹÓ÷½Ö®¼äÁ˽â¾ÍokÁË¡£±ÈÈç˵ÔÛÁ©Ô¼¶¨0×10±íʾa£¬0×11±íʾb¡£ÔÚÒ»¿ªÊ¼Ò²µÄÈ·ÊÇÕâÑùµÄ£¬³öÏÖÁ˸÷ ʽ¸÷ÑùµÄ±àÂë¡£ÕâÑùÓÐÁ½¸öÎÊÌ⣺1.¸÷¸ö±àÂëµÄ×Ö·û¼¯²»Ò»Ñù£¬ÓеĶ࣬ÓеÄÉÙ¡£2.Ïàͬ×Ö·ûµÄ±àÂëÒ²²»Ò»Ñù¡£ÄãÕâÀïaÊÇ0×10.ËûÄÇÀïa¿ÉÄÜÊÇ 0×30¡£ÓÚÊÇÄã±£´æµÄÎļþËû¾Í²»ÄÜÖ±½ÓÓ㬱ØÐëҪת»»±àÂë¡£Ëæ׏µÍ¨·¶Î§µÄÀ©´ó£¬²ÉÓò»Í¬±àÂëµÄÈËÃÇ»¥ÏàͨОÍÂÒÌ×ÁË£¬Õâ¾ÍÊÇÎÒÃdz£ËµµÄ£º¼¦Í¬Ñ¼½²¡£Èç ¹ûÒª±ÜÃâÕâÖÖ»ìÂÒ£¬ÄÇô´ó¼Ò¾Í±ØÐëʹÓÃÏàͬµÄ±àÂë¹æÔò£¬ÓÚÊÇÃÀ¹úÓйصıê×¼»¯×éÖ¯¾Í³ǫ̈ÁËASCII£¨American Standard Code for Information Interchange£©±àÂ룬ͳһ¹æ¶¨ÁËÓ¢Îij£Ó÷ûºÅÓÃÄÄЩ¶þ½øÖÆÊýÀ´±íʾ¡£ASCIIÊDZê×¼µÄµ¥×Ö½Ú×Ö·û±àÂë·½°¸£¬ÓÃÓÚ»ùÓÚÎı¾µÄÊý¾Ý¡£

ASCII×î³õÊÇÃÀ¹ú¹ú¼Ò±ê×¼£¬¹©²»Í¬¼ÆËã»úÔÚÏ໥ͨÐÅʱÓÃ×÷¹²Í¬×ñÊصÄÎ÷ÎÄ×Ö·û±àÂë±ê×¼£¬Òѱ»¹ú¼Ê±ê×¼»¯×éÖ¯£¨International Organization for Standardization, ISO£©¶¨Îª¹ú¼Ê±ê×¼£¬³ÆΪISO 646±ê×¼¡£ÊÊÓÃÓÚËùÓÐÀ­¶¡ÎÄ×Ö×Öĸ¡£ASCII ÂëʹÓÃÖ¸¶¨µÄ7 λ»ò8 λ¶þ½øÖÆÊý×éºÏÀ´±íʾ128 »ò256 ÖÖ¿ÉÄܵÄ×Ö·û¡£±ê×¼ASCII ÂëÒ²½Ð»ù´¡ASCIIÂ룬ʹÓÃ7 λ¶þ½øÖÆÊýÀ´±íʾËùÓеĴóдºÍСд×Öĸ£¬Êý×Ö0 µ½9¡¢±êµã·ûºÅ£¬ ÒÔ¼°ÔÚÃÀʽӢÓïÖÐʹÓõÄÌØÊâ¿ØÖÆ×Ö·û¡£¶ø×î¸ßλΪ1µÄÁí128¸ö×Ö·û£¨80H—FFH£©±»³ÆΪ“À©Õ¹ASCII”£¬Ò»°ãÓÃÀ´´æ·ÅÓ¢ÎĵÄÖƱí·û¡¢²¿·ÖÒô±ê×Ö ·ûµÈµÈµÄһЩÆäËü·ûºÅ¡£

ÆäÖУº0¡«31¼°127(¹²33¸ö)ÊÇ¿ØÖÆ×Ö·û»òͨÐÅרÓÃ×Ö·û£¨ÆäÓàΪ¿ÉÏÔʾ×Ö·û£©£¬32¡«126(¹²95¸ö)ÊÇ×Ö·û(32ÊÇ¿Õ¸ñ£©£¬ÆäÖÐ48¡«57Ϊ0µ½9Ê®¸ö°¢À­²®Êý×Ö£¬65¡«90Ϊ26¸ö´óдӢÎÄ×Öĸ£¬97¡«122ºÅΪ26¸öСдӢÎÄ×Öĸ£¬ÆäÓàΪһЩ±êµã·ûºÅ¡¢ÔËËã·ûºÅµÈ¡£

ascii ±àÂë±í

ÏÖÔÚËùÓÐʹÓÃÓ¢ÎĵĵçÄÔÖÕÓÚ¿ÉÒÔÓÃͬһÖÖ±àÂëÀ´½»Á÷ÁË¡£Àí½âÁËASCII±àÂ룬ÆäËû×ÖĸÐ͵ÄÓïÑÔ±àÂë·½°¸¾Í´¥ÀàÅÔͨÁË¡£

#p#

Ò»²¨ÈýÕÛµÄÖÐÎıàÂë

µÚÒ»´Î³¢ÊÔ£ºGB2312

ASCIIÕâÖÖ×Ö·û±àÂë¹æÔòÏÔÈ»ÓÃÀ´´¦ÀíÓ¢ÎÄûÓÐʲôÎÊÌ⣬ËüµÄ³öÏÖ¼«´óµÄ´Ù½øÁËÐÅÏ¢ÔÚÎ÷·½ÓÈÆäÊÇÃÀ¹úµÄ´«²¥ºÍ½»Á÷¡£µ«ÊǶÔÓÚÖÐÎÄ£¬³£Óúº×Ö¾ÍÓÐ 6000ÒÔÉÏ£¬ASCII µ¥×Ö½Ú±àÂëÏÔÈ»ÊDz»¹»Óá£ÎªÁË·ÛËéÃÀµÛ¹úÖ÷Òåͨ¹ý±àÂëÏÞÖÆÖйúÈËÃñʹÓõçÄÔµÄÎÞ³ÜÒõı£¬Öйú¹ú¼Ò±ê×¼×ַܾ¢²¼ÁËGB2312Âë¼´ÖлªÈËÃñ¹²ºÍ¹ú¹ú¼Òºº×ÖЊϢ½»»»ÓñàÂ룬ȫ³Æ¡¶ÐÅÏ¢½»»»Óúº×Ö±àÂë×Ö·û¼¯——»ù±¾¼¯¡·£¬1981Äê5ÔÂ1ÈÕʵʩ£¬Í¨ÐÐÓڴ󽡣GB2312×Ö·û¼¯Öгý³£ÓüòÌ庺×Ö×Ö·ûÍ⻹°üÀ¨Ï£À° ×Öĸ¡¢ÈÕÎÄƽ¼ÙÃû¼°Æ¬¼ÙÃû×Öĸ¡¢¶íÓïÎ÷Àï¶û×ÖĸµÈ×Ö·û£¬Î´ÊÕ¼·±ÌåÖÐÎĺº×ÖºÍһЩÉúƧ×Ö¡£ EUC-CN¿ÉÒÔÀí½âΪGB2312µÄ±ðÃû£¬ºÍGB2312Íê È«Ïàͬ¡£

GB2312ÊÇ»ùÓÚÇøλÂëÉè¼ÆµÄ£¬ÔÚÇøλÂëµÄÇøºÅºÍλºÅÉÏ·Ö±ð¼ÓÉÏA0H¾ÍµÃµ½ÁËGB2312±àÂë¡£ÕâÀïµÚÒ»´ÎÌáµ½ÁË“ÇøλÂ딣¬ÎÒ¾ÍÁ¬´ø°ÑÏÂÃæÕ⼸¸öÈÃÈËÃþ²»µ½Í·ÄÔµÄXXÂëÒ»¹ø¶ËÁË°É£º

ÇøλÂ룬¹ú±êÂ룬½»»»Â룬ÄÚÂ룬ÍâÂë

ÇøλÂ룺¾ÍÊÇ°ÑÖÐÎij£ÓõķûºÅ£¬Êý×Ö£¬ºº×ֵȷÖÃűðÀà½øÐбàÂë¡£ÇøλÂë°Ñ±àÂë±í·ÖΪ94¸öÇø£¬Ã¿¸öÇø¶ÔÓ¦94¸ö λ£¬Ã¿¸öλÖþͷÅÒ»¸ö×Ö·û£¨ºº×Ö£¬·ûºÅ£¬Êý×Ö¶¼ÊôÓÚ×Ö·û£©¡£ÕâÑùÿ¸ö×Ö·ûµÄÇøºÅºÍλºÅ×éºÏÆðÀ´¾Í³ÉΪ¸Ãºº×ÖµÄÇøλÂë¡£ÇøλÂëÒ»°ãÓÃ10½øÖÆÊýÀ´±íʾ£¬Èç 4907¾Í±íʾ49Çø7룬¶ÔÓ¦µÄ×Ö·ûÊǓѧ”¡£ÇøλÂëÖÐ01-09ÇøÊÇ·ûºÅ¡¢Êý×ÖÇø£¬16-87ÇøÊǺº×ÖÇø£¬10-15ºÍ88-94ÊÇ䶨ÒåµÄ¿Õ°×Çø¡£ Ëü½«ÊÕ¼µÄºº×Ö·Ö³ÉÁ½¼¶£ºµÚÒ»¼¶Êdz£Óúº×Ö¼Æ3755¸ö£¬ÖÃÓÚ16-55Çø£¬°´ººÓïÆ´Òô×Öĸ/±ÊÐÎ˳ÐòÅÅÁУ»µÚ¶þ¼¶ºº×ÖÊǴγ£Óúº×Ö¼Æ3008¸ö£¬ÖÃÓÚ 56-87Çø£¬°´²¿Ê×/±Ê»­Ë³ÐòÅÅÁС£ÔÚÍøÉÏËÑË÷“ÇøλÂë²éѯϵͳ”¿ÉÒԺܷ½±ãµÄÕÒµ½ºº×ֺͶÔÓ¦ÇøλÂëת»»µÄ¹¤¾ß¡£ÎªÁ˱ÜÃâ¹ã¸æÏÓÒɺÍËÀÁ´£¬ÕâÀï¾Í²»¾ÙÀý ÁË¡£

¹ú±êÂ룺 ÇøλÂëÎÞ·¨ÓÃÓÚºº×ÖͨÐÅ£¬ÒòΪËü¿ÉÄÜÓëͨÐÅʹÓõĿØÖÆÂ루00H~1FH£©£¨¼´0~31£¬»¹¼ÇµÃ ASCIIÂëÌØÊâ×Ö·ûµÄ·¶Î§Â𣿣©·¢Éú³åÍ»¡£ÓÚÊÇISO2022¹æ¶¨Ã¿¸öºº×ÖµÄÇøºÅºÍλºÅ±ØÐë·Ö±ð¼ÓÉÏ32£¨¼´¶þ½øÖÆÊý00100000£¬16½øÖÆ 20H£©£¬µÃµ½¶ÔÓ¦µÄ¹ú±ê½»»»Â룬¼ò³Æ¹ú±êÂ룬½»»»Â룬Òò´Ë£¬“ѧ”×ֵĹú±ê½»»»Âë¼ÆËãΪ£º

  00110001 00000111
+ 00100000 00100000
-------------------
  01010001 00100111

ÓÃÊ®Áù½øÖÆÊý±íʾ¼´Îª5127H¡£

½»»»Â룺¼´¹ú±ê½»»»ÂëµÄ¼ò³Æ£¬µÈͬÉÏÃæ˵µÄ¹ú±êÂë¡£

ÄÚÂ룺ÓÉÓÚÎı¾ÖÐͨ³£»ìºÏʹÓúº×ÖºÍÎ÷ÎÄ×Ö·û£¬ºº×ÖÐÅÏ¢Èç¹û²»ÓèÒÔÌرð±êʶ£¬¾Í»áÓëµ¥×Ö½ÚµÄASCIIÂë»ìÏý¡£ ´ËÎÊÌâµÄ½â¾ö·½·¨Ö®Ò»Êǽ«Ò»¸öºº×Ö¿´³ÉÊÇÁ½¸öÀ©Õ¹ASCIIÂ룬ʹ±íʾGB2312ºº×ÖµÄÁ½¸ö×Ö½ÚµÄ×î¸ßλ¶¼Îª1¡£¼´¹ú±êÂë¼ÓÉÏ128£¨¼´¶þ½øÖÆÊý 10000000,16½øÖÆ80H£©ÕâÖÖ¸ßλΪ1µÄË«×Ö½Úºº×Ö±àÂ뼴ΪGB2312ºº×ֵĻúÄÚÂ룬¼ò³ÆΪÄÚÂë¡£20H+80H=A0H¡£ÕâÒ²¾ÍÊdz£ËµµÄÔÚ ÇøλÂëµÄÇøºÅºÍλºÅÉÏ·Ö±ð¼ÓÉÏA0H¾ÍµÃµ½ÁËGB2312±àÂëµÄÓÉÀ´¡£

  00110001 00000111
+ 10100000 10100000
   -------------------
  11010001 10100111

ÓÃÊ®Áù½øÖÆÊý±íʾ¼´ÎªD1A7H¡£

ÍâÂ룺»úÍâÂëµÄ¼ò³Æ,¾ÍÊǺº×ÖÊäÈëÂ룬ÊÇΪÁËͨ¹ý¼üÅÌ×Ö·û°Ñºº×ÖÊäÈë¼ÆËã»ú¶øÉè¼ÆµÄÒ»ÖÖ±àÂë¡£ Ó¢ÎÄÊäÈëʱ£¬ÏàÊäÈëʲô×Ö·û±ã°´Ê²Ã´¼ü£¬ÍâÂëºÍÄÚÂëÒ»Ö¡£ºº×ÖÊäÈëʱ£¬¿ÉÄÜÒª°´¼¸¸ö¼ü²ÅÄÜÊäÈëÒ»¸öºº×Ö¡£ ºº×ÖÊäÈë·½°¸ÓгɰÙÉÏǧ¸ö£¬µ«ÊÇÕâǧ²îÍò±ðµÄÍâÂëÊäÈë½ø¼ÆËã»úºó¶¼»áת»»³ÉͳһµÄÄÚÂë¡£

×îºó×ܽáÒ»ÏÂÉÏÃæµÄ¸ÅÄî¡£Öйú¹ú¼Ò±ê×¼×ְܾÑÖÐÎij£ÓÃ×Ö·û±àÂëΪ94¸öÇø£¬Ã¿¸öÇø¶ÔÓ¦94¸öλ£¬Ã¿¸ö×Ö·ûµÄÇøºÅºÍλºÅ×éºÏÆðÀ´¾ÍÊǸÃ×Ö·ûµÄÇøλÂë, ÇøλÂëÓÃ10½øÖÆÊýÀ´±íʾ£¬Èç4907¾Í±íʾ49Çø7룬¶ÔÓ¦µÄ×Ö·ûÊǓѧ”¡£ ÓÉÓÚÇøλÂëµÄÈ¡Öµ·¶Î§ÓëͨÐÅʹÓõĿØÖÆÂ루00H~1FH£©£¨¼´0~31£©·¢Éú³åÍ»¡£Ã¿¸öºº×ÖµÄÇøºÅºÍλºÅ·Ö±ð¼ÓÉÏ32£¨¼´16½øÖÆ20H£©µÃµ½¹ú±êÂ룬½»»»Âë¡£“ѧ”µÄ¹ú±êÂëΪ5127H¡£ÓÉÓÚÎı¾ÖÐͨ³£»ìºÏʹÓúº×ÖºÍÎ÷ÎÄ×Ö·û£¬ÎªÁËÈúº×ÖÐÅÏ¢²»»áÓëµ¥×Ö½ÚµÄASCIIÂë»ìÏý£¬½«Ò»¸öºº×Ö¿´³ÉÊÇÁ½¸öÀ©Õ¹ASCIIÂ룬¼´ºº×ÖµÄÁ½¸ö×Ö½ÚµÄ×î¸ßλÖÃΪ1£¬µÃµ½µÄ±àÂëΪGB2312ºº×ÖµÄÄÚÂë¡£“ѧ”µÄÄÚÂëΪD1A7H¡£ÎÞÂÛÄãʹÓÃʲôÊäÈë·¨£¬Í¨¹ýʲôÑùµÄ°´¼ü×éºÏ°Ñ“ѧ”ÊäÈë¼ÆËã»ú£¬“ѧ”ÔÚʹÓÃGB2312£¨ÒÔ¼°¼æÈÝGB2312£©±àÂëµÄ¼ÆËã»úÀïµÄÄÚÂ붼ÊÇD1A7H¡£

µÚ¶þ´Î³¢ÊÔ£ºGBK

GB2312µÄ³öÏÖ»ù±¾Âú×ãÁ˺º×ֵļÆËã»ú´¦ÀíÐèÒª£¬µ«ÓÉÓÚÉÏÃæÌᵽδÊÕ¼·±Ìå×ÖºÍÉúƧ×Ö£¬´Ó¶ø²»ÄÜ´¦ÀíÈËÃû¡¢¹ÅººÓïµÈ·½Ãæ³öÏֵĺ±ÓÃ×Ö£¬Õâµ¼ÖÂÁË 1995Ä꡶ºº×Ö±àÂëÀ©Õ¹¹æ·¶¡·£¨GBK£©µÄ³öÏÖ¡£GBK±àÂëÊÇGB2312±àÂëµÄ³¬¼¯£¬ÏòÏÂÍêÈ«¼æÈÝGB2312£¬¼æÈݵĺ¬ÒåÊDz»½ö×Ö·û¼æÈÝ£¬¶øÇÒÏàͬ ×Ö·ûµÄ±àÂëÒ²Ïàͬ£¬Í¬Ê±ÔÚ×Ö»ãÒ»¼¶Ö§³ÖISO/IEC10646—1ºÍGB 13000—1µÄÈ«²¿ÖС¢ÈÕ¡¢º«£¨CJK£©ºº×Ö£¬¹²¼Æ20902×Ö¡£GBK»¹ÊÕ¼ÁËGB2312²»°üº¬µÄºº×Ö²¿Ê×·ûºÅ¡¢ÊúÅűêµã·ûºÅµÈ×Ö·û¡£CP936ºÍ GBKµÄÓÐЩÐí²î±ð£¬¾ø´ó¶àÊýÇé¿öÏ¿ÉÒÔ°ÑCP936µ±×÷GBKµÄ±ðÃû¡£

µÚÈý´Î³¢ÊÔ£ºGB18030

GB18030±àÂëÏòϼæÈÝGBKºÍGB2312¡£GB18030ÊÕ¼ÁËËùÓÐUnicode3.1ÖеÄ×Ö·û£¬°üÀ¨ÖйúÉÙÊýÃñ×å×Ö·û£¬GBK²»Ö§³ÖµÄ º«ÎÄ×Ö·ûµÈµÈ£¬Ò²¿ÉÒÔ˵ÊÇÊÀ½ç´ó¶àÃñ×åµÄÎÄ×Ö·ûºÅ¶¼±»ÊÕ¼ÔÚÄÚ¡£GBKºÍGB2312¶¼ÊÇË«×ֽڵȿí±àÂ룬Èç¹ûËãÉϺÍASCII¼æÈÝËùÖ§³ÖµÄµ¥×Ö½Ú£¬Ò²¿É ÒÔÀí½âΪÊǵ¥×Ö½ÚºÍË«×Ö½Ú»ìºÏµÄ±ä³¤±àÂë¡£GB18030±àÂëÊDZ䳤±àÂ룬Óе¥×Ö½Ú¡¢Ë«×Ö½ÚºÍËÄ×Ö½ÚÈýÖÖ·½Ê½¡£

Æäʵ£¬ÕâÈý¸ö±ê×¼²¢²»ÐèÒªËÀ¼ÇÓ²±³£¬Ö»ÐèÒªÁ˽âÊǸù¾ÝÓ¦ÓÃÐèÇ󲻶ÏÀ©Õ¹±àÂ뷶Χ¼´¿É¡£´ÓGB2312µ½GBKÔÙµ½GB18030ÊÕ¼µÄ×Ö·ûÔ½À´Ô½¶à ¼´¿É¡£ÍòÐÒµÄÊÇÒ»Ö±ÊÇÏòϼæÈݵģ¬Ò²¾ÍÊÇ˵һ¸öºº×ÖÔÚÕâÈý¸ö±àÂë±ê×¼ÀïµÄ±àÂëÊÇһģһÑùµÄ¡£ÕâЩ±àÂëµÄ¹²ÐÔÊDZ䳤±àÂ룬µ¥×Ö½ÚASCII¼æÈÝ£¬¶ÔÆäËû×Ö·û GB2312ºÍGBK¶¼Ê¹ÓÃË«×ֽڵȿí±àÂ룬ֻÓÐGB18030»¹ÓÐËÄ×Ö½Ú±àÂëµÄ·½Ê½¡£ÕâЩ±àÂë×î´óµÄÎÊÌâÊÇ2¸ö¡£1.ÓÉÓÚµÍ×ֽڵıàÂ뷶ΧºÍASCII ÓÐÖغϣ¬ËùÒÔ²»Äܸù¾ÝÒ»¸ö×Ö½ÚµÄÄÚÈÝÅжÏÊÇÖÐÎĵÄÒ»²¿·Ö»¹ÊÇÒ»¸ö¶ÀÁ¢µÄÓ¢ÎÄ×Ö·û¡£2.Èç¹ûÓÐÁ½¸öºº×Ö±àÂëΪA1A2B1B2£¬´æÔÚA2B1Ò²ÊÇÒ»¸öÓÐЧºº ×Ö±àÂëµÄÌØÊâÇé¿ö¡£ÕâÑù¾Í²»ÄÜÖ±½ÓʹÓñê×¼µÄ×Ö·û´®Æ¥Å亯ÊýÀ´ÅжÏÒ»¸ö×Ö·û´®ÀïÊÇ·ñ°üº¬Ä³Ò»¸öºº×Ö£¬¶øÐèÒªÏÈÅжÏ×Ö·û±ß½çÈ»ºó²ÅÄܽøÐÐ×Ö·ûÆ¥ÅäÅжϡ£

×îºó£¬ÌáÒ»¸öС²åÇú£¬ÉÏÃæ½²µÄ¶¼ÊÇ´ó½ÍÆÐеĺº×Ö±àÂë±ê×¼£¬Ê¹Ó÷±ÌåµÄÖÐÎÄÉçȺÖÐ×î³£ÓõĵçÄÔºº×Ö×Ö·û¼¯±ê ×¼½Ð´óÎåÂ루Big5£©£¬¹²ÊÕ¼13,060¸öÖÐÎÄ×Ö£¬ÆäÖÐÓжþ×ÖΪÖظ²±àÂë(ʵÔÚÊDz»Ó¦¸Ã)¡£Big5ËäÆÕ¼°ÓÚÖйúµĄ̈Íå¡¢Ïã¸ÛÓë°ÄÃŵȷ±ÌåÖÐÎÄͨÐÐ Çø£¬µ«³¤ÆÚÒÔÀ´²¢·Çµ±µØµÄ¹ú¼Ò±ê×¼£¬¶øÖ»ÊÇÒµ½ç±ê×¼¡£ÒÐÌìÖÐÎÄϵͳ¡¢WindowsµÈÖ÷ҪϵͳµÄ×Ö·û¼¯¶¼ÊÇÒÔBig5Ϊ»ù×¼£¬µ«³§ÉÌÓÖ¸÷×ÔÔöɾ£¬ÑÜÉú³É¶à ÖÖ²»Í¬°æ±¾¡£2003Ä꣬Big5±»ÊÕ¼µ½Ì¨Íå¹Ù·½±ê×¼µÄ¸½Â¼µ±ÖУ¬È¡µÃÁ˽ÏÕýʽµÄµØλ¡£Õâ¸ö×îа汾±»³ÆΪBig5-2003¡£

ÌìϹéÒ»Unicode

¿´ÁËÉÏÃæµÄ¶à¸öÖÐÎıàÂëÊDz»ÊÇÓеãÍ·ÔÎÁËÄØ£¿Èç¹û°ÑÕâ¸öÎÊÌâ·Åµ½È«ÊÀ½çn¶à¸ö¹ú¼Òn¶àÓïÖÖÄØ£¿¸÷¹úºÍ¸÷µØÇø×Ô¼ºµÄÎÄ×Ö±àÂë¹æÔò»¥Ïà³åÍ»µÄÇé¿öÈ«ÇòÐÅÏ¢½»»»´øÀ´Á˺ܴóµÄÂé·³¡£

ÒªÕæÕý³¹µ×½â¾öÕâ¸öÎÊÌ⣬ÉÏÃæ½éÉܵÄÄÇЩͨ¹ýÀ©Õ¹ASCIIÐÞÐÞ²¹²¹µÄ·½Ê½ÒѾ­×ß²»Í¨ÁË£¬¶ø±ØÐëÓÐÒ»¸öȫеıàÂëϵͳ£¬Õâ¸öϵͳҪ¿ÉÒÔ½«ÖÐÎÄ¡¢ÈÕ ÎÄ¡¢·¨ÎÄ¡¢µÂÎÄ……µÈµÈËùÓеÄÎÄ×ÖͳһÆðÀ´¿¼ÂÇ£¬ÎªÃ¿Ò»¸öÎÄ×Ö¶¼·ÖÅäÒ»¸öµ¥¶ÀµÄ±àÂë¡£ÓÚÊÇ£¬Unicodeµ®ÉúÁË¡£Unicode£¨Í³Ò»Âë¡¢Íò¹úÂë¡¢µ¥Ò» Â룩ΪµØÇòÉÏ£¨ÒÔºó»á°üÀ¨»ðÐÇ£¬½ðÐÇ£¬ß÷Ðǵȣ©Ã¿ÖÖÓïÑÔÖеÄÿ¸ö×Ö·ûÉ趨ÁËͳһ²¢ÇÒΨһµÄ¶þ½øÖƱàÂ룬ÒÔÂú×ã¿çÓïÑÔ¡¢¿çƽ̨½øÐÐÎı¾×ª»»¡¢´¦ÀíµÄÒªÇó¡£ÔÚ UnicodeÀËùÓеÄ×Ö·û±»Ò»ÊÓͬÈÊ£¬ºº×Ö²»ÔÙʹÓÓÁ½¸öÀ©Õ¹ASCII”£¬¶øÊÇʹÓÓ1¸öUnicode”À´±íʾ£¬Ò²¾ÍÊÇ˵£¬ËùÓеÄÎÄ×Ö¶¼°´Ò»¸ö×Ö ·ûÀ´´¦Àí£¬ËüÃǶ¼ÓÐÒ»¸öΨһµÄUnicodeÂë¡£UnicodeÓÃÊý×Ö0-0x10FFFFÀ´Ó³ÉäÕâЩ×Ö·û£¬×î¶à¿ÉÒÔÈÝÄÉ1114112¸ö×Ö·û£¬»òÕß˵ÓÐ 1114112¸öÂë루Âëλ¾ÍÊÇ¿ÉÒÔ·ÖÅä¸ø×Ö·ûµÄÊý×Ö£©¡£

Ìáµ½Unicode²»Äܲ»ÌáUCS£¨Í¨ÓÃ×Ö·û¼¯Universal Character Set£©¡£UCSÊÇÓÉISOÖƶ¨µÄISO 10646£¨»ò³ÆISO/IEC 10646£©±ê×¼Ëù¶¨ÒåµÄ±ê×¼×Ö·û¼¯¡£UCS-2ÓÃÁ½¸ö×Ö½Ú±àÂ룬UCS-4ÓÃ4¸ö×Ö½Ú±àÂë¡£UnicodeÊÇÓÉunicode.orgÖƶ¨µÄ±àÂë»ú ÖÆ£¬ISOÓëunicode.orgÊÇÁ½¸ö²»Í¬µÄ×éÖ¯, ËäÈ»×î³õÖƶ¨Á˲»Í¬µÄ±ê×¼; µ«Ä¿±êÊÇÒ»Öµġ£ËùÒÔ×Ô´Óunicode2.0¿ªÊ¼, unicode²ÉÓÃÁËÓëISO 10646-1ÏàͬµÄ×Ö¿âºÍ×ÖÂë, ISOÒ²³ÐŵISO10646½«²»»á¸ø³¬³ö0x10FFFFµÄUCS-4±àÂ븳ֵ, ʹµÃÁ½Õß±£³ÖÒ»Ö¡£´ó¼Ò¼òµ¥ÈÏΪUCSµÈͬÓÚUnicode¾Í¿ÉÒÔÁË¡£

ÔÚUnicodeÖУººº×Ö“×Ö”¶ÔÓ¦µÄÊý×ÖÊÇ23383¡£ÔÚUnicodeÖУ¬ÎÒÃÇÓкܶ෽ʽ½«Êý×Ö23383±íʾ³É³ÌÐòÖеÄÊý¾Ý£¬°üÀ¨£ºUTF- 8¡¢UTF-16¡¢UTF-32¡£UTFÊÇ“UCS Transformation Format”µÄËõд£¬¿ÉÒÔ·­Òë³ÉUnicode×Ö·û¼¯×ª»»¸ñʽ£¬¼´ÔõÑù½«Unicode¶¨ÒåµÄÊý×Öת»»³É³ÌÐòÊý¾Ý¡£ÀýÈ磬“ºº×Ö”¶ÔÓ¦µÄÊý×ÖÊÇ 0x6c49ºÍ0x5b57£¬¶ø±àÂëµÄ³ÌÐòÊý¾ÝÊÇ£º

BYTE data_utf8[] = {0xE6, 0xB1, 0x89, 0xE5, 0xAD, 0x97}; // UTF-8±àÂë
WORD data_utf16[] = {0x6c49, 0x5b57}; // UTF-16±àÂë
DWORD data_utf32[] = {0x6c49, 0x5b57}; // UTF-32±àÂë

ÕâÀïÓÃBYTE¡¢WORD¡¢DWORD·Ö±ð±íʾÎÞ·ûºÅ8λÕûÊý£¬ÎÞ·ûºÅ16λÕûÊýºÍÎÞ·ûºÅ32λÕûÊý¡£UTF-8¡¢UTF-16¡¢UTF-32·Ö±ð ÒÔBYTE¡¢WORD¡¢DWORD×÷Ϊ±àÂ뵥λ¡£“ºº×Ö”µÄUTF-8±àÂëÐèÒª6¸ö×Ö½Ú¡£“ºº×Ö”µÄUTF-16±àÂëÐèÒªÁ½¸öWORD£¬´óСÊÇ4¸ö×Ö½Ú¡£ “ºº×Ö”µÄUTF-32±àÂëÐèÒªÁ½¸öDWORD£¬´óСÊÇ8¸ö×Ö½Ú¡£¸ù¾Ý×Ö½ÚÐòµÄ²»Í¬£¬UTF-16¿ÉÒÔ±»ÊµÏÖΪUTF-16LE»òUTF- 16BE£¬UTF-32¿ÉÒÔ±»ÊµÏÖΪUTF-32LE»òUTF-32BE¡£

#p#

ÏÂÃæ½éÉÜUTF-8¡¢UTF-16¡¢UTF-32¡¢BOM¡£

UTF-8

UTF-8ÒÔ×Ö½ÚΪµ¥Î»¶ÔUnicode½øÐбàÂë¡£´ÓUnicodeµ½UTF-8µÄ±àÂ뷽ʽÈçÏ£º

Unicode±àÂë(16½øÖÆ)

UTF-8 ×Ö½ÚÁ÷(¶þ½øÖÆ)

000000 – 00007F

0xxxxxxx

000080 – 0007FF

110xxxxx 10xxxxxx

000800 – 00FFFF

1110xxxx 10xxxxxx 10xxxxxx

010000 – 10FFFF

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

UTF-8µÄÌصãÊǶԲ»Í¬·¶Î§µÄ×Ö·ûʹÓò»Í¬³¤¶ÈµÄ±àÂë¡£¶ÔÓÚ0×00-0x7FÖ®¼äµÄ×Ö·û£¬UTF-8±àÂëÓëASCII±àÂëÍêÈ«Ïàͬ¡£UTF-8 ±àÂëµÄ×î´ó³¤¶ÈÊÇ4¸ö×Ö½Ú¡£´ÓÉϱí¿ÉÒÔ¿´³ö£¬4×Ö½ÚÄ£°åÓÐ21¸öx£¬¼´¿ÉÒÔÈÝÄÉ21λ¶þ½øÖÆÊý×Ö¡£UnicodeµÄ×î´óÂëλ0x10FFFFÒ²Ö»ÓÐ21 λ¡£×ܽáÁËһϹæÂÉ£ºUTF-8µÄµÚÒ»¸ö×Ö½Ú¿ªÊ¼µÄ1µÄ¸öÊý´ú±íÁË×ܵıàÂë×Ö½ÚÊý£¬ºóÐø×Ö½Ú¶¼ÊÇÒÔ10¿ªÊ¼¡£ÓÉÉÏÃæµÄ¹æÔò¿ÉÒÔÇåÎúµÄ¿´³öUTF-8±àÂë¿Ë ·þÁËÖÐÎıàÂëµÄÁ½¸öÎÊÌâ¡£

Àý1£º“ºº”×ÖµÄUnicode±àÂëÊÇ0x6C49¡£0x6C49ÔÚ0×0800-0xFFFFÖ®¼ä£¬Ê¹ÓÃ3×Ö½ÚÄ£°åÁË£º1110xxxx 10xxxxxx 10xxxxxx¡£½«0x6C49д³É¶þ½øÖÆÊÇ£º0110 1100 0100 1001£¬ ÓÃÕâ¸ö±ÈÌØÁ÷ÒÀ´Î´úÌæÄ£°åÖеÄx£¬µÃµ½£º11100110 10110001 10001001£¬¼´E6 B1 89¡£

Àý2£ºUnicode±àÂë0x20C30ÔÚ0×010000-0x10FFFFÖ®¼ä£¬Ê¹ÓÃÓÃ4×Ö½ÚÄ£°åÁË£º11110xxx 10xxxxxx 10xxxxxx 10xxxxxx¡£½«0x20C30д³É21λ¶þ½øÖÆÊý×Ö£¨²»×ã21λ¾ÍÔÚÇ°Ãæ²¹0£©£º0 0010 0000 1100 0011 0000£¬ÓÃÕâ¸ö±ÈÌØÁ÷ÒÀ´Î´úÌæÄ£°åÖеÄx£¬µÃµ½£º11110000 10100000 10110000 10110000£¬¼´F0 A0 B0 B0¡£

UTF-16

UTF-16±àÂëÒÔ16λÎÞ·ûºÅÕûÊýΪµ¥Î»¡£ÎÒÃÇ°ÑUnicode±àÂë¼Ç×÷U¡£±àÂë¹æÔòÈçÏ£º¡¡¡¡Èç¹ûU<0×10000£¬UµÄUTF-16 ±àÂë¾ÍÊÇU¶ÔÓ¦µÄ16λÎÞ·ûºÅÕûÊý£¨ÎªÊéд¼ò±ã£¬ÏÂÎĽ«16λÎÞ·ûºÅÕûÊý¼Ç×÷WORD£©¡£ÖÐÎÄ·¶Î§ 4E00-9FBF£¬ËùÒÔÔÚUTF-16±àÂëÀïÖÐÎÄ2¸ö×Ö½Ú±àÂë¡£Èç¹ûU≥0×10000£¬ÎÒÃÇÏȼÆËãU’=U-0×10000£¬È»ºó½«U’д³É¶þ½øÖÆÐΠʽ£ºyyyy yyyy yyxx xxxx xxxx£¬UµÄUTF-16±àÂ루¶þ½øÖÆ£©¾ÍÊÇ£º110110yyyyyyyyyy 110111xxxxxxxxxx¡£

UTF-32

UTF-32±àÂëÒÔ32λÎÞ·ûºÅÕûÊýΪµ¥Î»¡£UnicodeµÄUTF-32±àÂë¾ÍÊÇÆä¶ÔÓ¦µÄ32λÎÞ·ûºÅÕûÊý¡£

×Ö½ÚÐò

¸ù¾Ý×Ö½ÚÐò(¶Ô×Ö½ÚÐò²»Ì«Á˽âµÄͬѧÇë²Î¿¼http://en.wikipedia.org/wiki/Endianness)µÄ²»Í¬£¬UTF-16¿ÉÒÔ±»ÊµÏÖΪUTF-16LE£¨Little Endian£©»òUTF-16BE£¨Big Endian£©£¬UTF-32¿ÉÒÔ±»ÊµÏÖΪUTF-32LE»òUTF-32BE¡£ÀýÈ磺

Unicode±àÂë

UTF-16LE

UTF-16BE

UTF-32LE

UTF-32BE

0x006C49

49 6C

6C 49

49 6C 00 00

00 00 6C 49

0x020C30

43 D8 30 DC

D8 43 DC 30

30 0C 02 00

00 02 0C 30

ÄÇô£¬ÔõôÅжÏ×Ö½ÚÁ÷µÄ×Ö½ÚÐòÄØ£¿Unicode±ê×¼½¨ÒéÓÃBOM£¨Byte Order Mark£©À´Çø·Ö×Ö½ÚÐò£¬¼´ÔÚ´«Êä×Ö½ÚÁ÷Ç°£¬ÏÈ´«Êä±»×÷ΪBOMµÄ×Ö·û”Áã¿íÎÞÖжϿոñ”¡£Õâ¸ö×Ö·ûµÄ±àÂëÊÇFEFF£¬¶ø·´¹ýÀ´µÄFFFE£¨UTF- 16£©ºÍFFFE0000£¨UTF-32£©ÔÚUnicodeÖж¼ÊÇ䶨ÒåµÄÂë룬²»Ó¦¸Ã³öÏÖÔÚʵ¼Ê´«ÊäÖС£Ï±íÊǸ÷ÖÖUTF±àÂëµÄBOM£º

UTF±àÂë

Byte Order Mark

UTF-8

EF BB BF

UTF-16LE

FF FE

UTF-16BE

FE FF

UTF-32LE

FF FE 00 00

UTF-32BE

00 00 FE FF

×ܽáһϣ¬ISOÓëunicode.org¶¼ÃôÈñµÄÒâʶµ½Ö»ÓÐΪÊÀ½çÉÏÿÖÖÓïÑÔÖеÄÿ¸ö×Ö·ûÉ趨ͳһ²¢ÇÒΨһµÄ¶þ½øÖƱàÂë²ÅÄܳ¹µ×½â¾ö¼ÆËã»úÊÀ½çЊϢ½»Á÷ÖбàÂë³åÍ»µÄÎÊÌâ¡£Óɴ˵®ÉúÁËUCSºÍunicode£¬¶øÕâÁ½¸ö¹æ·¶ÊÇÒ»Öµġ£ÔÚUnicodeÀËùÓеÄ×Ö·û±»Ò»ÊÓͬÈÊ£¬Ò²¾ÍÊÇ˵£¬ËùÓеÄÎÄ×Ö¶¼ °´Ò»¸ö×Ö·ûÀ´´¦Àí£¬ËüÃǶ¼ÓÐÒ»¸öΨһµÄUnicodeÂë¡£UTF-8¡¢UTF-16¡¢UTF-32·Ö±ð¶¨ÒåÁËÔõÑù½«Unicode¶¨ÒåµÄÊý×Öת»»³É³ÌÐòÊý ¾Ý¡£UTF-8ÒÔ×Ö½ÚΪµ¥Î»¶ÔUnicode½øÐбàÂ룬һ¸öÓ¢ÎÄ×Ö·ûÕ¼1¸ö×Ö½Ú£¬ºº×ÖÕ¼3¸ö×Ö½Ú£»UTF-16ÒÔ16λÎÞ·ûºÅÕûÊýΪµ¥Î»¶ÔUnicode ½øÐбàÂ룬ÖÐÎÄÓ¢ÎĶ¼Õ¼2¸ö×Ö½Ú£»UTF-32ÒÔ32λÎÞ·ûºÅÕûÊýΪµ¥Î»¶ÔUnicode½øÐбàÂ룬ÖÐÎÄÓ¢ÎĶ¼Õ¼4¸ö×Ö½Ú¡£¿ÉÒÔÔÚhttp://www.unicode.org/charts/unihan.html ²é¿´ºº×ÖµÄunicodeÂëÒÔ¼°UTF-8¡¢UTF-16¡¢UTF-32±àÂë¡£

ÖÐÎĶþ½øÖÆ´æ´¢

½éÉÜÁËÕâô¶àµÄ±àÂë֪ʶ£¬ÕæÕýµÄÎļþÄÚÈÝÊÇʲôÑù×ÓµÄÄØ£¿ÏÂÃæÎÒÃǾÍͨ¹ýʵÑé¿´¿´ÔÚ±ÊÕßLinux»úÆ÷ÉÏ “ÖÐÎÄ”ÕâÁ½¸ö×ÖÔÚ²»Í¬µÄ±àÂëϱ£´æµÄÎļþÄÚÈÝ¡£ÏÂÃæÊÇÎÒµÄʵÑé¹ý³Ì£¬ÓÐÐËȤµÄͬѧ¿ÉÒÔÔÚ×Ô¼ºµÄ»úÆ÷ÉÏÖØ×öһϡ£windowƽ̨ÉϵÄÇé¿öÀàËÆÕâÀï¾Í²»×¸ÊöÁË¡£

ʵÑéÐèÒªÐèҪʹÓÃ2¸ö¹¤¾ß£º

  1. od ²é¿´ÎļþÄÚÈÝ£ºhttp://www.gnu.org/software/coreutils/manual/html_node/od-invocation.html

  2. iconv ±àÂëת»»¹¤¾ß£ºhttp://www.gnu.org/software/libiconv/

ºº×Ö

Unicode£¨ucs-2£©10½øÖƱíʾ

Utf-8

Utf-16

Utf32

ÇøλÂë

GB2312/GBK/GB18030

ÖÐ

20013

E4 B8 AD

4E2D

00004E2D

5448

D6D0

ÎÄ

25991

E6 96 87

6587

00006587

4636

CEC4

»úÆ÷»·¾³£º
os: Red Hat Enterprise Linux AS release 4
Cpu: Intel(R) Xeon(R) CPU
locale£ºLC_ALL=zh_CN.utf-8

//Éú³Éutf8±àÂëϵÄÎļþ
echo –n "ÖÐÎÄ" > foo.utf8

//¼ì²éfooµÄÄÚÈÝ£º
od -t x1 foo.utf8
0000000 e4 b8 ad e6 96 87

//ת»»Îªutf16±àÂë
iconv -f utf-8 -t utf-16 foo.utf8 > foo.utf16

//²é¿´foo.utf16ÄÚÈÝ
od -t x1 foo.utf16
0000000 ff fe 2d 4e 87 65
Ff feÊÇBOM£¨»¹¼ÇµÃÂð£¿Í¨¹ýBOMÀ´×Ö½ÚÁ÷µÄ×Ö½ÚÐò£©£¬ÆäÓಿ·ÖµÄÈ·ÊÇUTF-16LE±àÂëµÄÄÚÈÝ

//ת»»Îªutf32±àÂë
iconv -f utf-16 -t utf-32 foo.utf16 > foo.utf32

//²é¿´foo.utf32ÄÚÈÝ
od -t x1 foo.utf32
0000000 ff fe 00 00 2d 4e 00 00 87 65 00 00
Ff feÊÇBOM£¬µÄÈ·ÊÇUTF-32LE±àÂëµÄÄÚÈÝ

//ת»»Îªgb2312±àÂë
iconv -f utf-8 -t gb2312 foo.txt > foo.gb2312
od -t x1 foo.gb2312
0000000 d6 d0 ce c4

//ת»»ÎªGBK±àÂë
iconv -f utf-8 -t gbk foo.txt > foo.gbk
od -t x1 foo.gbk
0000000 d6 d0 ce c4

//ת»»ÎªGB18030±àÂë
iconv -f utf-8 -t gb18030 foo.txt > foo.gb18030
od -t x1 foo.gb18030
0000000 d6 d0 ce c4

CÓïÑÔÖÐÎÄ´¦Àí

ÏÈÃ÷È·Ò»¸ö¸ÅÄ³ÌÐòÄÚ²¿±àÂëºÍ³ÌÐòÍⲿ±àÂë¡£³ÌÐòÄÚ²¿±àÂëÖ¸µÄÊÇÖÐÎÄ×Ö·ûÔÚ³ÌÐòÔËÐÐʱÔÚÄÚ´æÖеıàÂëÐÎʽ¡£³ÌÐòÍⲿ±àÂëÔòÊÇÖÐÎÄ×Ö·ûÔÚ´æ´¢»òÕß´«ÊäʱµÄ±àÂëÐÎʽ¡£³ÌÐòÍⲿ±àÂëµÄ×îÖ±¹ÛµÄÀý×Ó¾ÍÊǵ±°ÑÖÐÎÄ´æ´¢µ½Ó²ÅÌÎļþÖÐʱѡÔñµÄ±àÂë¡£

¸ù¾Ý³ÌÐòÄÚ²¿±àÂëºÍ³ÌÐòÍⲿ±àÂëÊÇ·ñÒ»Ö£¬C/C++µÄÖÐÎÄ´¦ÀíÓÐÁ½ÖÖ³£¼ûµÄ·½Ê½£º

  1. ÄÚÍâ±àÂëÏàͬ¡£ÊäÈëÊä³öʱ²»ÐèÒª¿¼ÂDZàÂëת»»£¬³ÌÐòÄÚ²¿´¦Àíʱ°ÑÖÐÎÄ×Ö·ûµ±×öÆÕͨµÄ2½øÖÆÊý¾ÝÁ÷½øÐд¦Àí¡£

  2. ÄÚÍâ±àÂ벻ͬ¡£ÊäÈëÊä³öµÄʱºò¸ù¾ÝÓ¦ÓÃÐèҪѡÔñºÏÊʵıàÂë¸ñʽ½øÐбàÂëת»»£»³ÌÐòÄÚ²¿Í³Ò»±àÂë´¦Àí¡£

·½·¨1µÄÓŵ㲻ÑÔ¶øÓ÷£¬ÓÉÓÚÄÚÍâͳһ£¬²»ÐèÒª½øÐÐת»»¡£²»×ãÊÇÈç¹û²»ÊÇC±ê×¼¿âÖ§³ÖµÄ±àÂ뷽ʽ£¬ÄÇô×Ö·û´®´¦Àíº¯ÊýÐèÒª×Ô¼ºÊµÏÖ¡£±ÈÈç˵±ê×¼ strlenº¯Êý²»ÄܼÆËãÖÐÎıàÂë&UTF-8µÈµÄ×Ö·û´®³¤¶È£¬¶øÐèÒª¸ù¾Ý±àÂë±ê×¼×ÔÐÐʵÏÖ¡£GBKµÈÖÐÎıàÂë³ýÁ˼ÆËã×Ö·û´®³¤¶ÈµÄº¯ÊýÍ⣬×Ö·û ´®Æ¥Å亯ÊýÒ²Òª×Ô¼ºÊµÏÖ£¨Ô­Òò¿´ÉÏÎÄÖÐÎıàÂë×ܽᣩ¡£µ±ÐèÒªÖ§³ÖµÄ±àÂë¸ñʽ²»¶ÏÔö¶àʱ£¬´¦Àíº¯ÊýµÄ¿ª·¢ºÍά»¤¾ÍÐèÒª¸¶³ö¸ü´óµÄ´ú¼Û¡£

·½·¨2Õë¶Ô·½·¨1µÄ²»×ã¼ÓÒԸĽø¡£ÔÚ³ÌÐòÄÚ²¿¿ÉÒÔÓÅÏÈÑ¡ÔñC±ê×¼¿âÖ§³ÖµÄ±àÂ뷽ʽ£¬»òÕ߸ù¾ÝÐèÒª×Ô¼ºÊµÏÖ¶ÔijһÌض¨±àÂë¸ñʽµÄÍêÕûÖ§³Ö£¬ÕâÑùÈκαàÂ붼¿ÉÒÔÏÈת»»ÎªÖ§³ÖµÄ±àÂ룬´úÂëͨÓÃÐԱȽϺá£

ÄÇôC±ê×¼¿â¶ÔÖÐÎıàÂëµÄÖ§³ÖÈçºÎÄØ£¿Ä¿Ç°Linuxƽ̨һ°ãʹÓÃGNU C library£¬ÄÚ½¨Á˶Ե¥×Ö½ÚµÄcharºÍ¿í×Ö·ûwchar_tµÄÖ§³Ö¡£Char´ó¼Ò¶¼ºÜÊìϤÁË£¬´¦ÀíÖÐÎÄÐèÒªµÄwchar_tÒªÖصã½éÉÜһϡ£´ÓʵÏÖ ÉÏÀ´ËµÔÚlinuxƽ̨ÉÏ¿ÉÒÔÈÏΪwchar_tÊÇ4byteµÄint£¬ÄÚ²¿´æ´¢×Ö·ûµÄUTF32±àÂë¡£ÓÉÓÚ±ê×¼¿âÒѾ­ÄÚ½¨Á˶Ôwchar_t±È½ÏÍ걸µÄ Ö§³Ö£¬ÈçʹÓÃwcslen ¼ÆËã×Ö·û´®³¤¶È£¬Ê¹ÓÃwcscmp½øÐÐ×Ö·û´®±È½ÏµÈµÈ¡£ËùÒԱȽϼòµ¥µÄ·½Ê½ÊÇʹÓÃÉÏÃæµÄ·½·¨2£¬Í¬Ê±Ñ¡Ôñwchar_t×÷ΪÄÚ ²¿×Ö·ûµÄ±íʾ¡£×öµ½ÕâÒ»µã»¹ÊDZȽÏÈÝÒ׵ģ¬ÔÚÊäÈëÊä³öµÄʱºòͨ¹ýmbrtowc/wcrtomb ½øÐе¥¸ö×Ö·ûµÄÄÚÍâ±àÂëת»»£¬ÒÔ¼°Í¨¹ý mbsrtowcs/wcsrtombs ½øÐÐ×Ö·û´®µÄÄÚÍâ±àÂëת»»¼´¿É¡£ÕâÀïÐèҪעÒâÁ½µã£º

  1. ´úÂëÖÐ×Ö·û´®³£Á¿µÄ±íʾ²»Í¬¡£¾ÙÀý˵Ã÷£ºChar c=’a’; Wchar_t wc=L’ÖÐ’;

  2. ÉÏÃæÁ½×麯ÊýµÄת»»ÊÇÒÀÀµlocaleÉèÖõģ¬¼´locale¾ö¶¨ÁËÍⲿ±àÂëµÄÀàÐÍ¡£È·ÇеÄ˵ÊÇLC_CTYPE¾ö¶¨ÁËÍⲿ±àÂëµÄÀàÐÍ¡£Ä¬ÈÏÇé¿ö ϳÌÐòÆô¶¯Ê±Ê¹Óñê×¼“C”locale£¬¶ø²»ÊÇLCϵÁеĻ·¾³±äÁ¿Ö¸¶¨µÄ¡£ËùÒÔÐèÒªÊ×Ïȵ÷ÓÃÏÂÃæµÄº¯Êý£ºsetlocale (LC_ALL, “”);ÕâÑù³ÌÐò¾ÍʹÓÃÁËÓû§Í¨¹ýÉèÖÃLCϵÁл·¾³±äÁ¿Ñ¡ÔñµÄLocale¡£

¹ØÓÚlocaleµÄ»°Ìâ±È½Ï´ó£¬ÕâÀï¾Í²»ÉîÈëÁË£¬Áô´ýÏÂһƪÎÄÕ°É.

ÉÏÃæµÄ·½·¨ºÜÍêÃÀ£¬ÊÇÂ𣿲»ÊÇÂ𣿵õ½Õâô¶àµÄºÃ´¦²»ÊÇÎÞ´ú¼ÛµÄ£¬×îÃ÷ÏԵĴú¼Û¾ÍÊÇÄڴ棬ÈκÎÒ»¸ö×Ö·û£¬²»¹ÜÖÐÎÄ»¹ÊÇÓ¢ÎÄÈç¹û±£³ÖÔÚwchar_tÀï¾ÍÐèÒª4¸öbyte£¬¾ÍÕâÒ»¸öÀíÓɾÍ×ãÒÔÏÞÖÆÁËÕâ¸ö·½°¸ÔÚ¹Ø×¢ÄÚ´æʹÓõÄÓ¦Ó󡾰ϵÄʹÓá£

PythonµÄÖÐÎÄ´¦Àí

¶ÔPythonÀ´ËµÓÉÓÚÄÚ½¨unicdeµÄÖ§³Ö£¬ËùÒÔ²ÉÓÃÊäÈëÊä³öµÄʱºò½øÐÐת»»£¬ÄÚ²¿±£³ÖunicodeµÄ·½Ê½Ê¹ÓÃÊǸö²»´íµÄ·½°¸¡£http://docs.python.org/tutorial/introduction.html#unicode-stringsÕâÀï×÷ΪÆðµã£¬ÓÐÐËȤµÄͬѧ×Ôѧ°É¡£

±àÂëÑ¡Ôñ½¨Ò飺

  1. Ö»ÓÐÓ¢ÎÄ£ººÁ²»ÓÌԥѡÔñÄÚÍâ±àÂ붼ѡÔñASCII£¬Í¨ÓÃÇÒ´æ´¢´ú¼ÛС¡£

  2. Ö÷Òª´æÖÐÎÄ£¬¶Ô´æ´¢´óС±È½ÏÃô¸Ð£ºÄÚÍⲿ±àÂë¸ù¾ÝÎÄ×ÖʹÓ÷¶Î§Ñ¡ÔñGB2312»òÕßGBK£¬×ÔÐÐʵÏÖʹÓõ½µÄ×Ö·û´®´¦Àíº¯Êý¡£

  3. ͨÓÃÐÔµÚÒ»£¬´¦Àí¼òµ¥£ºÍⲿѡÔñUTF-8£¬ÄÚ²¿¿ÉÒÔʹÓÃUTF-8»òÕßUTF-32£¨¼´wchar_t£©

²Î¿¼×ÊÁÏ£º

http://baike.baidu.com/view/25492.htm

http://baike.baidu.com/view/25421.htm

http://baike.baidu.com/view/40801.htm

http://www.sac.gov.cn/SACSearch/search?channelid=160591&templet=gjcxjg_detail.jsp&searchword=STANDARD_CODE=’GB%202312-1980′&XZ=Q

http://www.sac.gov.cn/SACSearch/search?channelid=160591&templet=gjcxjg_detail.jsp&searchword=STANDARD_CODE=’GB%2018030-2005′&XZ=Q

http://www.sac.gov.cn/SACSearch/search?channelid=160591&templet=gjcxjg_detail.jsp&searchword=STANDARD_CODE=’GB%2013000-2010′&XZ=Q

http://www.unicode.org/

http://www.ibm.com/developerworks/cn/linux/i18n/unicode/linuni/

http://www.gnu.org/software/libc/manual/html_node/index.html

http://www.gnu.org/software/libiconv/

¡¾±à¼­ÍƼö¡¿

责任编辑:王雪燕 来源: 搜索技术博客
相关推荐

2009-11-27 16:16:58

Suse中文编码

2009-07-16 16:51:56

WebWork验证机制

2012-12-24 15:07:28

symbian

2009-02-18 14:28:23

编码乱码JSP

2017-06-23 18:00:22

前端知识杂谈

2017-04-01 08:30:00

MVCMVPMVVM

2010-07-02 12:02:11

eMule协议

2017-03-31 20:45:41

MVCMVPMVVM

2013-07-09 16:12:47

2010-05-25 17:41:20

2010-03-24 18:00:30

Python中文转换u

2013-11-12 11:18:28

企业开源开源社区

2012-03-08 13:58:39

苹果新iPad

2022-03-07 16:30:10

数据库ORM开发人员

2013-02-22 11:00:09

2023-10-16 08:55:43

Redisson分布式

2023-05-19 07:31:48

2009-07-29 13:32:06

ASP.NET控件使用

2015-06-16 14:38:37

数据中心

2009-08-13 17:14:55

点赞
收藏

51CTO技术栈公众号