Mailing List Archive

Lucene introduction in Chinese
http://www.chedong.com/tech/lucene.html

ÔÚÓ¦ÓÃÖмÓÈëÈ«ÎļìË÷¹¦ÄÜ
¡ª¡ª»ùÓÚJAVAµÄÈ«ÎÄË÷ÒýÒýÇæLucene¼ò½é

×÷Õߣº ³µ¶« chedong@bigfoot.com

×îºó¸üУº2002-08-11 02:08:46

°æȨÉùÃ÷£º¿ÉÒÔÈÎÒâתÔØ£¬×ªÔØʱÇëÎñ±Ø±êÃ÷ԭʼ³ö´¦ºÍ×÷ÕßÐÅÏ¢

¹Ø¼ü´Ê£ºLucene full-text search engine Chinese word
segment

ÕªÒª£ºLuceneÊÇÒ»¸ö»ùÓÚJAVAµÄÈ«ÎÄË÷Òý¹¤¾ß°ü¡£

»ùÓÚJAVAµÄÈ«ÎÄË÷ÒýÒýÇæLucene¼ò½é£º¹ØÓÚ×÷ÕߺÍLuceneµÄÀúÊ·

È«ÎļìË÷µÄʵÏÖ£ºLueneÈ«ÎÄË÷ÒýºÍÊý¾Ý¿âË÷ÒýµÄ±È½Ï
ÖÐÎÄÇзִʻúÖƼò½é£º»ùÓÚ´Ê¿âºÍ×Ô¶¯ÇзִÊËã·¨µÄ±È½Ï
¾ßÌåµÄ°²×°ºÍʹÓüò½é£ºÏµÍ³½á¹¹½éÉܺÍÑÝʾ
Hacking
Lucene£º¼ò»¯µÄ²éѯ·ÖÎöÆ÷£¬É¾³ýµÄʵÏÖ£¬¶¨ÖƵÄÅÅÐò£¬Ó¦ÓýӿڵÄÀ©Õ¹

´ÓLuceneÎÒÃÇ»¹¿ÉÒÔѧµ½Ê²Ã´
»ùÓÚJAVAµÄÈ«ÎÄË÷Òý/¼ìË÷ÒýÇ桪¡ªLucene

Lucene²»ÊÇÒ»¸öÍêÕûµÄÈ«ÎÄË÷ÒýÓ¦Ó㬶øÊÇÊÇÒ»¸öÓÃJAVAдµÄÈ«ÎÄË÷ÒýÒýÇ湤¾ß°ü£¬Ëü¿ÉÒÔ·½±ãµÄǶÈëµ½¸÷ÖÖÓ¦ÓÃÖÐʵÏÖÕë¶ÔÓ¦ÓõÄÈ«ÎÄË÷Òý/¼ìË÷¹¦ÄÜ¡£

LuceneµÄ×÷ÕߣºLuceneµÄ¹±Ï×ÕßDoug
CuttingÊÇһλ×ÊÉîÈ«ÎÄË÷Òý/¼ìË÷ר¼Ò£¬Ôø¾­ÊÇV-TwinËÑË÷ÒýÇæ(AppleµÄCopland²Ù×÷ϵͳµÄ³É¾ÍÖ®Ò»)µÄÖ÷Òª¿ª·¢Õߣ¬ºóÔÚExciteµ£Èθ߼¶ÏµÍ³¼Ü¹¹Éè¼Æʦ£¬Ä¿Ç°´ÓÊÂÓÚһЩINTERNETµ×²ã¼Ü¹¹µÄÑо¿¡£Ëû¹±Ï׳öµÄLuceneµÄÄ¿±êÊÇΪ¸÷ÖÖÖÐСÐÍÓ¦ÓóÌÐò¼ÓÈëÈ«ÎļìË÷¹¦ÄÜ¡£

LuceneµÄ·¢Õ¹Àú³Ì£ºÔçÏÈ·¢²¼ÔÚ×÷Õß×Ô¼ºµÄwww.Lucene.com£¬ºóÀ´ËÞÖ÷ÔÚSOURCEFORGE£¬2001ÄêÄêµ×³ÉΪAPACHE»ù½ð»ájakartaµÄÒ»¸ö×ÓÏîÄ¿£ºhttp://jakarta.apache.org/Lucene/

»ùÓÚLuceneµÄÓ¦Óãº
ÒѾ­ÓкܶàJAVAÏîÄ¿¶¼Ê¹ÓÃÁËLucene×÷ΪÆäºǫ́µÄÈ«ÎÄË÷ÒýÒýÇ棬±È½ÏÖøÃûµÄÓУº

JIVE£ºWEBÂÛ̳ϵͳ£»
Eyebrows£ºÓʼþÁбíHTML¹éµµ/ä¯ÀÀ/²éѯϵͳ£¬±¾ÎĵÄÖ÷Òª²Î¿¼Îĵµ¡°The
Lucene search engine: Powerful, flexible, and
free¡±×÷Õß¾ÍÊÇEyeBrowsϵͳµÄÖ÷Òª¿ª·¢ÕßÖ®Ò»£¬¶øEyeBrowsÒѾ­³ÉΪĿǰAPACHEÏîÄ¿µÄÖ÷ÒªÓʼþÁбí¹éµµÏµÍ³¡£

Cocoon: »ùÓÚXMLµÄweb·¢²¼¿ò¼Ü£¬È«ÎļìË÷²¿·ÖʹÓÃÁËLUCENE

¶ÔÓÚÖÐÎÄÓû§À´Ëµ£¬×î¹ØÐĵÄÎÊÌâÊÇÆäÊÇ·ñÖ§³ÖÖÐÎĵÄÈ«ÎļìË÷¡£µ«Í¨¹ýºóÃæ¶ÔÓÚLuceneµÄ½á¹¹µÄ½éÉÜ£¬Äã»áÁ˽⵽ÓÉÓÚLuceneÁ¼ºÃ¼Ü¹¹Éè¼Æ£¬Ö»ÐèһЩ¼òµ¥µÄ½Ó¿ÚÀ©Õ¹¾ÍÄÜʵÏÖ¶ÔÖÐÎļìË÷µÄÖ§³Ö¡£

È«ÎļìË÷µÄʵÏÖ»úÖÆ

LuceneµÄAPI½Ó¿ÚÉè¼ÆµÄ±È½ÏͨÓã¬ÊäÈëÊä³ö½á¹¹¶¼ºÜÏñÊý¾Ý¿âµÄ±í==>¼Ç¼==>×ֶΣ¬ËùÒԺܶഫͳµÄÓ¦ÓõÄÎļþ¡¢Êý¾Ý¿âµÈ¶¼¿ÉÒԱȽϷ½±ãµÄÓ³Éäµ½LuceneµÄ´æ´¢½á¹¹/½Ó¿ÚÖС£×ÜÌåÉÏ¿´£º¿ÉÒÔÏÈ°ÑLuceneµ±³ÉÒ»¸öÖ§³ÖÈ«ÎÄË÷ÒýµÄÊý¾Ý¿âϵͳ¡£

±È½ÏÒ»ÏÂLuceneºÍÊý¾Ý¿â£º

Lucene Êý¾Ý¿â
Ë÷ÒýÊý¾ÝÔ´£ºdoc(field1,field2...)
doc(field1,field2...)
\ indexer /
_____________
| Lucene Index|
--------------
/ searcher \
½á¹ûÊä³ö£ºHits(doc(field1,field2) doc(field1...))
Ë÷ÒýÊý¾ÝÔ´£ºrecord(field1,field2...) record(field1..)
\ SQL: insert/
_____________
| DB Index |
-------------
/ SQL: select \
½á¹ûÊä³ö£ºresults(record(field1,field2..)
record(field1...))
Document£ºÒ»¸öÐèÒª½øÐÐË÷ÒýµÄ¡°µ¥Ôª¡±
Ò»¸öDocumentÓɶà¸ö×Ö¶Î×é³É Record£º¼Ç¼£¬°üº¬¶à¸ö×Ö¶Î
Field£º×ֶΠField£º×Ö¶Î
Hits£º²éѯ½á¹û¼¯£¬ÓÉÆ¥ÅäµÄDocument×é³É
RecordSet£º²éѯ½á¹û¼¯£¬Óɶà¸öRecord×é³É

È«ÎļìË÷ ¡Ù like "%keyword%"

ͨ³£±È½ÏºñµÄÊé¼®ºóÃæ³£³£¸½¹Ø¼ü´ÊË÷Òý±í£¨±ÈÈ磺±±¾©£º12,
34Ò³£¬ ÉϺ££º3,
77Ò³¡­¡­£©£¬ËüÄܹ»°ïÖú¶ÁÕ߱ȽϿìµØÕÒµ½Ïà¹ØÄÚÈݵÄÒ³Âë¡£¶øÊý¾Ý¿âË÷ÒýÄܹ»´ó´óÌá¸ß²éѯµÄËÙ¶ÈÔ­ÀíÒ²ÊÇÒ»Ñù£¬ÏëÏñÒ»ÏÂͨ¹ýÊéºóÃæµÄË÷Òý²éÕÒµÄËÙ¶ÈÒª±ÈÒ»Ò³Ò»Ò³µØ·­ÄÚÈݸ߶àÉÙ±¶¡­¡­¶øË÷ÒýÖ®ËùÒÔЧÂʸߣ¬ÁíÍâÒ»¸öÔ­ÒòÊÇËüÊÇÅźÃÐòµÄ¡£¶ÔÓÚ¼ìË÷ϵͳÀ´ËµºËÐÄÊÇÒ»¸öÅÅÐòÎÊÌâ¡£

ÓÉÓÚÊý¾Ý¿âË÷Òý²»ÊÇΪȫÎÄË÷ÒýÉè¼ÆµÄ£¬Òò´Ë£¬Ê¹ÓÃlike
"%keyword%"ʱ£¬Êý¾Ý¿âË÷ÒýÊDz»Æð×÷Óõģ¬ÔÚʹÓÃlike²éѯʱ£¬ËÑË÷¹ý³ÌÓÖ±ä³ÉÀàËÆÓÚÒ»Ò³Ò³·­ÊéµÄ±éÀú¹ý³ÌÁË£¬ËùÒÔ¶ÔÓÚº¬ÓÐÄ£ºý²éѯµÄÊý¾Ý¿â·þÎñÀ´Ëµ£¬LIKE¶ÔÐÔÄܵÄΣº¦ÊǼ«´óµÄ¡£Èç¹ûÊÇÐèÒª¶Ô¶à¸ö¹Ø¼ü´Ê½øÐÐÄ£ºýÆ¥Å䣺like
"%keyword1%" and like "%keyword2%"
...ÆäЧÂÊÒ²¾Í¿ÉÏë¶øÖªÁË¡£

ËùÒÔ½¨Á¢Ò»¸ö¸ßЧ¼ìË÷ϵͳµÄ¹Ø¼üÊǽ¨Á¢Ò»¸öÀàËÆÓڿƼ¼Ë÷ÒýÒ»ÑùµÄ·´ÏòË÷Òý»úÖÆ£¬½«Êý¾ÝÔ´£¨±ÈÈç¶àƪÎÄÕ£©ÅÅÐò˳Ðò´æ´¢µÄͬʱ£¬ÓÐÁíÍâÒ»¸öÅźÃÐòµÄ¹Ø¼ü´ÊÁÐ±í£¬ÓÃÓÚ´æ´¢¹Ø¼ü´Ê==>ÎÄÕÂÓ³Éä¹Øϵ£¬ÀûÓÃÕâÑùµÄÓ³Éä¹ØϵË÷Òý£º[¹Ø¼ü´Ê==>³öÏֹؼü´ÊµÄÎÄÕ±àºÅ£¬³öÏÖ´ÎÊý£¨ÉõÖÁ°üÀ¨Î»ÖãºÆðʼƫÒÆÁ¿£¬½áÊøÆ«ÒÆÁ¿£©£¬³öÏÖƵÂÊ]£¬¼ìË÷¹ý³Ì¾ÍÊÇ°ÑÄ£ºý²éѯ±ä³É¶à¸ö¿ÉÒÔÀûÓÃË÷ÒýµÄ¾«È·²éѯµÄÂß¼­×éºÏµÄ¹ý³Ì¡£´Ó¶ø´ó´óÌá¸ßÁ˶à¹Ø¼ü´Ê²éѯµÄЧÂÊ£¬ËùÒÔ£¬È«ÎļìË÷ÎÊÌâ¹é½áµ½×îºóÊÇÒ»¸öÅÅÐòÎÊÌâ¡£

ÓÉ´Ë¿ÉÒÔ¿´³öÄ£ºý²éѯÏà¶ÔÊý¾Ý¿âµÄ¾«È·²éѯÊÇÒ»¸ö·Ç³£²»È·¶¨µÄÎÊÌ⣬ÕâÒ²ÊǴ󲿷ÖÊý¾Ý¿â¶ÔÈ«ÎļìË÷Ö§³ÖÓÐÏÞµÄÔ­Òò¡£Lucene×îºËÐĵÄÌØÕ÷ÊÇͨ¹ýÌØÊâµÄË÷Òý½á¹¹ÊµÏÖÁË´«Í³Êý¾Ý¿â²»Éó¤µÄÈ«ÎÄË÷Òý»úÖÆ£¬²¢ÌṩÁËÀ©Õ¹½Ó¿Ú£¬ÒÔ·½±ãÕë¶Ô²»Í¬Ó¦ÓõĶ¨ÖÆ¡£

¿ÉÒÔͨ¹ýһϱí¸ñ¶Ô±ÈÒ»ÏÂÊý¾Ý¿âµÄÄ£ºý²éѯ£º

¡¡ LuceneÈ«ÎÄË÷ÒýÒýÇæ Êý¾Ý¿â
Ë÷Òý ½«Êý¾ÝÔ´ÖеÄÊý¾Ý¶¼Í¨¹ýÈ«ÎÄË÷ÒýÒ»Ò»½¨Á¢·´ÏòË÷Òý
¶ÔÓÚLIKE
²éѯÀ´Ëµ£¬Êý¾Ý´«Í³µÄË÷ÒýÊǸù±¾Óò»Éϵġ£Êý¾ÝÐèÒªÖð¸ö±ãÀû¼Ç¼½øÐÐGREPʽµÄÄ£ºýÆ¥Å䣬±ÈÓÐË÷ÒýµÄËÑË÷ËÙ¶ÈÒªÓжà¸öÊýÁ¿¼¶µÄϽµ¡£

Æ¥ÅäЧ¹û
ͨ¹ý´ÊÔª(term)½øÐÐÆ¥Å䣬ͨ¹ýÓïÑÔ·ÖÎö½Ó¿ÚµÄʵÏÖ£¬¿ÉÒÔʵÏÖ¶ÔÖÐÎĵȷÇÓ¢ÓïµÄÖ§³Ö¡£
ʹÓãºlike "%net%" »á°ÑnetherlandsҲƥÅä³öÀ´£¬
¶à¸ö¹Ø¼ü´ÊµÄÄ£ºýÆ¥Å䣺ʹÓÃlike
"%com%net%"£º¾Í²»ÄÜÆ¥Åä´ÊÐòµßµ¹µÄxxx.net..xxx.com
Æ¥Åä¶È
ÓÐÆ¥Åä¶ÈËã·¨£¬½«Æ¥Åä³Ì¶È£¨ÏàËƶȣ©±È½Ï¸ßµÄ½á¹ûÅÅÔÚÇ°Ãæ¡£
ûÓÐÆ¥Åä³Ì¶ÈµÄ¿ØÖÆ£º±ÈÈçÓмǼÖÐnet³öÏÖ5´ÊºÍ³öÏÖ1´ÎµÄ£¬½á¹ûÊÇÒ»ÑùµÄ¡£

½á¹ûÊä³ö
ͨ¹ýÌرðµÄËã·¨£¬½«×îÆ¥Åä¶È×î¸ßµÄÍ·100Ìõ½á¹ûÊä³ö£¬½á¹û¼¯ÊÇ»º³åʽµÄСÅúÁ¿¶ÁÈ¡µÄ¡£
·µ»ØËùÓеĽá¹û¼¯£¬ÔÚÆ¥ÅäÌõÄ¿·Ç³£¶àµÄʱºò£¨±ÈÈçÉÏÍòÌõ£©ÐèÒª´óÁ¿µÄÄÚ´æ´æ·ÅÕâЩÁÙʱ½á¹û¼¯¡£

¿É¶¨ÖÆÐÔ
ͨ¹ý²»Í¬µÄÓïÑÔ·ÖÎö½Ó¿ÚʵÏÖ£¬¿ÉÒÔ·½±ãµÄ¶¨ÖƳö·ûºÏÓ¦ÓÃÐèÒªµÄË÷Òý¹æÔò£¨°üÀ¨¶ÔÖÐÎĵÄÖ§³Ö£©
ûÓнӿڻò½Ó¿Ú¸´ÔÓ£¬ÎÞ·¨¶¨ÖÆ
½áÂÛ
¸ß¸ºÔصÄÄ£ºý²éѯӦÓã¬ÐèÒª¸ºÔðµÄÄ£ºý²éѯµÄ¹æÔò£¬Ë÷ÒýµÄ×ÊÁÏÁ¿±È½Ï´ó
ʹÓÃÂʵͣ¬Ä£ºýÆ¥Åä¹æÔò¼òµ¥»òÕßÐèҪģºý²éѯµÄ×ÊÁÏÁ¿ÉÙ

LuceneµÄ´´ÐÂÖ®´¦£º

´ó²¿·ÖµÄËÑË÷£¨Êý¾Ý¿â£©ÒýÇ涼ÊÇÓÃBÊ÷½á¹¹À´Î¬»¤Ë÷Òý£¬Ë÷ÒýµÄ¸üлᵼÖ´óÁ¿µÄIO²Ù×÷£¬LuceneÔÚʵÏÖÖУ¬¶Ô´ËÉÔ΢ÓÐËù¸Ä½ø£º²»ÊÇά»¤Ò»¸öË÷ÒýÎļþ£¬¶øÊÇÔÚÀ©Õ¹Ë÷ÒýµÄʱºò²»¶Ï´´½¨ÐµÄË÷ÒýÎļþ£¬È»ºó¶¨ÆڵİÑÕâЩеÄСË÷ÒýÎļþºÏ²¢µ½Ô­ÏȵĴóË÷ÒýÖУ¨Õë¶Ô²»Í¬µÄ¸üвßÂÔ£¬Åú´ÎµÄ´óС¿ÉÒÔµ÷Õû£©£¬ÕâÑùÔÚ²»Ó°Ïì¼ìË÷µÄЧÂʵÄÇ°ÌáÏ£¬Ìá¸ßÁËË÷ÒýµÄЧÂÊ¡£

LuceneºÍÆäËûһЩȫÎļìË÷ϵͳ/Ó¦ÓõıȽϣº

¡¡ Lucene ÆäËû¿ªÔ´È«ÎļìË÷ϵͳ
ÔöÁ¿Ë÷ÒýºÍÅúÁ¿Ë÷Òý
¿ÉÒÔ½øÐÐÔöÁ¿µÄË÷Òý(Append)£¬¿ÉÒÔ¶ÔÓÚ´óÁ¿Êý¾Ý½øÐÐÅúÁ¿Ë÷Òý£¬²¢ÇÒ½Ó¿ÚÉè¼ÆÓÃÓÚÓÅ»¯ÅúÁ¿Ë÷ÒýºÍСÅúÁ¿µÄÔöÁ¿Ë÷Òý¡£
ºÜ¶àϵͳֻ֧³ÖÅúÁ¿µÄË÷Òý£¬ÓÐʱÊý¾ÝÔ´ÓÐÒ»µãÔö¼ÓÒ²ÐèÒªÖؽ¨Ë÷Òý¡£

Êý¾ÝÔ´
LuceneûÓж¨Òå¾ßÌåµÄÊý¾ÝÔ´£¬¶øÊÇÒ»¸öÎĵµµÄ½á¹¹£¬Òò´Ë¿ÉÒԷdz£Áé»îµÄÊÊÓ¦¸÷ÖÖÓ¦Óã¨Ö»ÒªÇ°¶ËÓкÏÊʵÄת»»Æ÷°ÑÊý¾ÝԴת»»³ÉÏàÓ¦½á¹¹£©£¬
ºÜ¶àϵͳֻÕë¶ÔÍøÒ³£¬È±·¦ÆäËû¸ñʽÎĵµµÄÁé»îÐÔ¡£
ÄÚÈÝ·Ö¸î
LuceneµÄÎĵµÊÇÓɶà¸ö×Ö¶Î×é³ÉµÄ£¬ÉõÖÁ¿ÉÒÔ¿ØÖÆÄÇЩ×Ö¶ÎÐèÒªË÷Òý£¬
ÄÇЩ×ֶβ»ÐèÒªË÷Òý£¬½üÒ»²½Ë÷ÒýµÄ×Ö¶ÎÒ²·Ö£º
ÐèÒª½øÐзִʵÄË÷Òý£¬±ÈÈ磺±êÌ⣬ÎÄÕÂÄÚÈÝ×Ö¶Î
²»ÐèÒª½øÐзִʵÄË÷Òý£¬±ÈÈ磺×÷Õß/ÈÕÆÚ×Ö¶Î
ȱ·¦Í¨ÓÃÐÔ£¬ÍùÍù½«ÎĵµÕû¸öË÷ÒýÁË
ÓïÑÔ·ÖÎö ͨ¹ýÓïÑÔ·ÖÎöÆ÷µÄ²»Í¬À©Õ¹ÊµÏÖ£º
¿ÉÒÔ¹ýÂ˵ô²»ÐèÒªµÄ´Ê£ºan the of µÈ£¬
Î÷ÎÄÓï·¨·ÖÎö£º½«jumps jumped
jumper¶¼¹é½á³Éjump½øÐÐË÷Òý/¼ìË÷
·ÇÓ¢ÎÄÖ§³Ö£º¶ÔÑÇÖÞÓïÑÔ£¬°¢À­²®ÓïÑÔµÄË÷ÒýÖ§³Ö
ȱ·¦Í¨ÓýӿÚʵÏÖ
²éѯ·ÖÎö
ͨ¹ý²éѯ·ÖÎö½Ó¿ÚµÄʵÏÖ£¬¿ÉÒÔ¶¨ÖÆ×Ô¼ºµÄ²éѯÓï·¨¹æÔò£º
±ÈÈ磺 ¶à¸ö¹Ø¼ü´ÊÖ®¼äµÄ + - and or¹ØϵµÈ ¡¡
²¢·¢·ÃÎÊ Äܹ»Ö§³Ö¶àÓû§µÄʹÓà ¡¡

¡¡

¹ØÓÚÑÇÖÞÓïÑԵĵÄÇзִÊÎÊÌâ(Word Segment)

¶ÔÓÚÖÐÎÄÀ´Ëµ£¬È«ÎÄË÷ÒýÊ×ÏÈ»¹Òª½â¾öÒ»¸öÓïÑÔ·ÖÎöµÄÎÊÌ⣬¶ÔÓÚÓ¢ÎÄÀ´Ëµ£¬Óï¾äÖе¥´ÊÖ®¼äÊÇÌìȻͨ¹ý¿Õ¸ñ·Ö¿ªµÄ£¬µ«ÑÇÖÞÓïÑÔµÄÖÐÈÕº«ÎÄÓï¾äÖеÄ×ÖÊÇÒ»¸ö×Ö°¤Ò»¸ö£¬ËùÓУ¬Ê×ÏÈÒª°ÑÓï¾äÖа´¡°´Ê¡±½øÐÐË÷ÒýµÄ»°£¬Õâ¸ö´ÊÈçºÎÇзֳöÀ´¾ÍÊÇÒ»¸öºÜ´óµÄÎÊÌâ¡£

Ê×ÏÈ£¬¿Ï¶¨²»ÄÜÓõ¥¸ö×Ö·û×÷(si-gram)ΪË÷Òýµ¥Ôª£¬·ñÔò²é¡°ÉϺ£¡±Ê±£¬²»ÄÜÈú¬ÓС°º£ÉÏ¡±Ò²Æ¥Åä¡£

µ«Ò»¾ä»°£º¡°±±¾©Ìì°²ÃÅ¡±£¬¼ÆËã»úÈçºÎ°´ÕÕÖÐÎĵÄÓïÑÔÏ°¹ß½øÐÐÇзÖÄØ£¿
¡°±±¾© Ìì°²ÃÅ¡± »¹ÊÇ¡°±± ¾© Ìì°²
ÃÅ¡±£¿ÈüÆËã»úÄܹ»°´ÕÕÓïÑÔÏ°¹ß½øÐÐÇз֣¬ÍùÍùÐèÒª»úÆ÷ÓÐÒ»¸ö±È½Ï·á¸»µÄ´Ê¿â²ÅÄܹ»±È½Ï׼ȷµÄʶ±ð³öÓï¾äÖеĵ¥´Ê¡£

ÁíÍâÒ»¸ö½â¾öµÄ°ì·¨ÊDzÉÓÃ×Ô¶¯ÇзÖËã·¨£º½«µ¥´Ê°´ÕÕ2ÔªÓï·¨(bigram)·½Ê½ÇзֳöÀ´£¬±ÈÈ磺
"±±¾©Ìì°²ÃÅ" ==> "±±¾© ¾©Ìì Ìì°² °²ÃÅ"¡£

ÕâÑù£¬ÔÚ²éѯµÄʱºò£¬ÎÞÂÛÊDzéѯ"±±¾©"
»¹ÊDzéѯ"Ìì°²ÃÅ"£¬½«²éѯ´Ê×鰴ͬÑùµÄ¹æÔò½øÐÐÇз֣º"±±¾©"£¬"Ìì°²
°²ÃÅ"£¬¶à¸ö¹Ø¼ü´ÊÖ®¼ä°´Óë"and"µÄ¹Øϵ×éºÏ£¬Í¬ÑùÄܹ»ÕýÈ·µØÓ³Éäµ½ÏàÓ¦µÄË÷ÒýÖС£ÕâÖÖ·½Ê½¶ÔÓÚÆäËûÑÇÖÞÓïÑÔ£ºº«ÎÄ£¬ÈÕÎĶ¼ÊÇͨÓõġ£

»ùÓÚ×Ô¶¯ÇзֵÄ×î´óÓŵãÊÇûÓдʱíά»¤³É±¾£¬ÊµÏÖ¼òµ¥£¬È±µãÊÇË÷ÒýЧÂʵͣ¬µ«¶ÔÓÚÖÐСÐÍÓ¦ÓÃÀ´Ëµ£¬»ùÓÚ2ÔªÓï·¨µÄÇзֻ¹Êǹ»Óõġ£

×Ô¶¯ÇÐ·Ö ´Ê±íÇзÖ
ʵÏÖ ÊµÏַdz£¼òµ¥ ʵÏÖ¸´ÔÓ
²éѯ Ôö¼ÓÁ˲éѯ·ÖÎöµÄ¸´Ôӳ̶ȣ¬
ÊÊÓÚʵÏֱȽϸ´ÔӵIJéѯÓï·¨¹æÔò
´æ´¢Ð§ÂÊ Ë÷ÒýÈßÓà´ó£¬Ë÷Òý¼¸ºõºÍÔ­ÎÄÒ»Ñù´ó
Ë÷ÒýЧÂʸߣ¬ÎªÔ­ÎÄ´óСµÄ30£¥×óÓÒ
ά»¤³É±¾ Î޴ʱíά»¤³É±¾
´Ê±íά»¤³É±¾·Ç³£¸ß£ºÖÐÈÕº«µÈÓïÑÔÐèÒª·Ö±ðά»¤¡£
»¹ÐèÒª°üÀ¨´ÊƵͳ¼ÆµÈÄÚÈÝ
ÊÊÓÃÁìÓò ǶÈëʽϵͳ£ºÔËÐл·¾³×ÊÔ´ÓÐÏÞ
·Ö²¼Ê½ÏµÍ³£ºÎ޴ʱíͬ²½ÎÊÌâ
¶àÓïÑÔ»·¾³£ºÎ޴ʱíά»¤³É±¾
¶Ô²éѯºÍ´æ´¢Ð§ÂÊÒªÇó¸ßµÄרҵËÑË÷ÒýÇæ


Ä¿Ç°±È½Ï´óµÄËÑË÷ÒýÇæµÄÓïÑÔ·ÖÎöËã·¨Ò»°ãÊÇ»ùÓÚÒÔÉÏ2¸ö»úÖƵĽáºÏ¡£¹ØÓÚÖÐÎĵÄÓïÑÔ·ÖÎöËã·¨£¬´ó¼Ò¿ÉÒÔÔÚGOOGLE²é¹Ø¼ü´Ê"word
segment search"ÄÜÕÒµ½¸ü¶àÏà¹ØµÄ×ÊÁÏ¡£

°²×°ºÍʹÓÃ

ÏÂÔØ£ºhttp://jakarta.apache.org/Lucene/

×¢Ò⣺LuceneÖеÄһЩ±È½Ï¸´ÔӵĴʷ¨·ÖÎöÊÇÓÃJavaCCÉú³ÉµÄ£¨JavaCC£ºJava
Compiler
Compiler£¬´¿JAVAµÄ´Ê·¨·ÖÎöÉú³ÉÆ÷£©£¬ËùÒÔÈç¹û´ÓÔ´´úÂë±àÒë»òÐèÒªÐÞ¸ÄÆäÖеÄQueryParser¡¢¶¨ÖÆ×Ô¼ºµÄ´Ê·¨·ÖÎöÆ÷£¬»¹ÐèÒª´Óhttp://www.webgain.com/products/java_cc/ÏÂÔØjavacc¡£

luceneµÄ×é³É½á¹¹£º¶ÔÓÚÍⲿӦÓÃÀ´ËµË÷ÒýÄ£¿é(index)ºÍ¼ìË÷Ä£¿é(search)ÊÇÖ÷ÒªµÄÍⲿӦÓÃÈë¿Ú

org.apache.Lucene.search/ ËÑË÷Èë¿Ú
org.apache.Lucene.index/ Ë÷ÒýÈë¿Ú
org.apache.Lucene.analysis/ ÓïÑÔ·ÖÎöÆ÷
org.apache.Lucene.queryParser/ ²éѯ·ÖÎöÆ÷
org.apache.Lucene.document/ ´æ´¢½á¹¹
org.apache.Lucene.store/ µ×²ãIO/´æ´¢½á¹¹
org.apache.Lucene.util/ һЩ¹«ÓõÄÊý¾Ý½á¹¹

¼òµ¥µÄÀý×ÓÑÝʾһÏÂLuceneµÄʹÓ÷½·¨£º

Ë÷Òý¹ý³Ì£º´ÓÃüÁîÐжÁÈ¡ÎļþÃû£¨¶à¸ö£©£¬½«Îļþ·Ö·¾¶(path×Ö¶Î)ºÍÄÚÈÝ(body×Ö¶Î)2¸ö×ֶνøÐд洢£¬²¢¶ÔÄÚÈݽøÐÐÈ«ÎÄË÷Òý£ºË÷ÒýµÄµ¥Î»ÊÇDocument¶ÔÏó£¬Ã¿¸öDocument¶ÔÏó°üº¬¶à¸ö×Ö¶ÎField¶ÔÏó£¬Õë¶Ô²»Í¬µÄ×Ö¶ÎÊôÐÔºÍÊý¾ÝÊä³öµÄÐèÇ󣬶Ô×ֶλ¹¿ÉÒÔÑ¡Ôñ²»Í¬µÄË÷Òý/´æ´¢×ֶιæÔò£¬ÁбíÈçÏ£º
·½·¨ ÇÐ´Ê Ë÷Òý ´æ´¢ ÓÃ;
Field.Text(String name, String value) Yes Yes Yes
ÇзִÊË÷Òý²¢´æ´¢£¬±ÈÈ磺±êÌ⣬ÄÚÈÝ×Ö¶Î
Field.Text(String name, Reader value) Yes Yes No
ÇзִÊË÷Òý²»´æ´¢£¬±ÈÈ磺METAÐÅÏ¢£¬
²»ÓÃÓÚ·µ»ØÏÔʾ£¬µ«ÐèÒª½øÐмìË÷ÄÚÈÝ
Field.Keyword(String name, String value) No Yes Yes
²»ÇзÖË÷Òý²¢´æ´¢£¬±ÈÈ磺ÈÕÆÚ×Ö¶Î
Field.UnIndexed(String name, String value) No No Yes
²»Ë÷Òý£¬Ö»´æ´¢£¬±ÈÈ磺Îļþ·¾¶
Field.UnStored(String name, String value) Yes Yes No
ֻȫÎÄË÷Òý£¬²»´æ´¢

public class IndexFiles {
//ʹÓ÷½·¨£º: IndexFiles [Ë÷ÒýÊä³öĿ¼]
[Ë÷ÒýµÄÎļþÁбí] ...
public static void main(String[] args) throws
Exception {
String indexPath = args[0];
IndexWriter writer;

//ÓÃÖ¸¶¨µÄÓïÑÔ·ÖÎöÆ÷¹¹ÔìÒ»¸öеÄдË÷ÒýÆ÷£¨µÚ3¸ö²ÎÊý±íʾÊÇ·ñΪ׷¼ÓË÷Òý£©
writer = new IndexWriter(indexPath, new
SimpleAnalyzer(), false);

for (int i=1; i<args.length; i++) {
System.out.println("Indexing file " + args[i]);
InputStream is = new FileInputStream(args[i]);

//¹¹Ôì°üº¬2¸ö×Ö¶ÎFieldµÄDocument¶ÔÏó
//Ò»¸öÊÇ·¾¶path×ֶΣ¬²»Ë÷Òý£¬Ö»´æ´¢
//Ò»¸öÊÇÄÚÈÝbody×ֶΣ¬½øÐÐÈ«ÎÄË÷Òý£¬²¢´æ´¢
Document doc = new Document();
doc.add(Field.UnIndexed("path", args[i]));
doc.add(Field.Text("body", (Reader) new
InputStreamReader(is)));
//½«ÎĵµÐ´ÈëË÷Òý
writer.addDocument(doc);
is.close();
};
//¹Ø±ÕдË÷ÒýÆ÷
writer.close();
}
}
¡¡
Ë÷Òý¹ý³ÌÖпÉÒÔ¿´µ½£º

ÓïÑÔ·ÖÎöÆ÷ÌṩÁ˳éÏóµÄ½Ó¿Ú£¬Òò´ËÓïÑÔ·ÖÎö(Analyser)ÊÇ¿ÉÒÔ¶¨ÖƵģ¬ËäÈ»luceneȱʡÌṩÁË2¸ö±È½ÏͨÓõķÖÎöÆ÷SimpleAnalyserºÍStandardAnalyser£¬Õâ2¸ö·ÖÎöÆ÷ȱʡ¶¼²»Ö§³ÖÖÐÎÄ£¬ËùÒÔÒª¼ÓÈë¶ÔÖÐÎÄÓïÑÔµÄÇзֹæÔò£¬ÐèÒªÐÞ¸ÄÕâ2¸ö·ÖÎöÆ÷¡£

Lucene²¢Ã»Óй涨Êý¾ÝÔ´µÄ¸ñʽ£¬¶øÖ»ÌṩÁËÒ»¸öͨÓõĽṹ£¨Document¶ÔÏó£©À´½ÓÊÜË÷ÒýµÄÊäÈ룬Òò´ËÊäÈëµÄÊý¾ÝÔ´¿ÉÒÔÊÇ£ºÊý¾Ý¿â£¬WORDÎĵµ£¬PDFÎĵµ£¬HTMLÎĵµ¡­¡­Ö»ÒªÄܹ»Éè¼ÆÏàÓ¦µÄ½âÎöת»»Æ÷½«Êý¾ÝÔ´¹¹Ôì³É³ÉDocuement¶ÔÏ󼴿ɽøÐÐË÷Òý¡£

¶ÔÓÚ´óÅúÁ¿µÄÊý¾ÝË÷Òý£¬»¹¿ÉÒÔͨ¹ýµ÷ÕûIndexerWriteµÄÎļþºÏ²¢ÆµÂÊÊôÐÔ£¨mergeFactor£©À´Ìá¸ßÅúÁ¿Ë÷ÒýµÄЧÂÊ¡£

¼ìË÷¹ý³ÌºÍ½á¹ûÏÔʾ£º

ËÑË÷½á¹û·µ»ØµÄÊÇHits¶ÔÏ󣬿ÉÒÔͨ¹ýËüÔÙ·ÃÎÊDocument==>FieldÖеÄÄÚÈÝ¡£

¼ÙÉè¸ù¾Ýbody×ֶνøÐÐÈ«ÎļìË÷£¬¿ÉÒÔ½«²éѯ½á¹ûµÄpath×ֶκÍÏàÓ¦²éѯµÄÆ¥Åä¶È(score)´òÓ¡³öÀ´£¬

public class Search {
public static void main(String[] args) throws
Exception {
String indexPath = args[0], queryString = args[1];
//Ö¸ÏòË÷ÒýĿ¼µÄËÑË÷Æ÷
Searcher searcher = new IndexSearcher(indexPath);
//²éѯ½âÎöÆ÷£ºÊ¹ÓúÍË÷ÒýͬÑùµÄÓïÑÔ·ÖÎöÆ÷
Query query = QueryParser.parse(queryString,
"body",
new SimpleAnalyzer());
//ËÑË÷½á¹ûʹÓÃHits´æ´¢
Hits hits = searcher.search(query);
//ͨ¹ýhits¿ÉÒÔ·ÃÎʵ½ÏàÓ¦×ֶεÄÊý¾ÝºÍ²éѯµÄÆ¥Åä¶È
for (int i=0; i<hits.length(); i++) {
System.out.println(hits.doc(i).get("path") + ";
Score: " +
hits.score(i));
};
}
}
ÔÚÕû¸ö¼ìË÷¹ý³ÌÖУ¬ÓïÑÔ·ÖÎöÆ÷£¬²éѯ·ÖÎöÆ÷£¬ÉõÖÁËÑË÷Æ÷£¨Searcher£©¶¼ÊÇÌṩÁ˳éÏóµÄ½Ó¿Ú£¬¿ÉÒÔ¸ù¾ÝÐèÒª½øÐж¨ÖÆ¡£
Hacking Lucene

¼ò»¯µÄ²éѯ·ÖÎöÆ÷

¸öÈ˸оõlucene³ÉΪJAKARTAÏîÄ¿ºó£¬»­ÔÚÁËÌ«¶àµÄʱ¼äÓÃÓÚµ÷ÊÔÈÕÇ÷¸´ÔÓQueryParser£¬¶øÆäÖд󲿷ÖÊÇ´ó¶àÊýÓû§²¢²»ºÜÊìϤµÄ£¬Ä¿Ç°LUCENEÖ§³ÖµÄÓï·¨£º

Query ::= ( Clause )*
Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "("
Query ")" )

ÖмäµÄÂß¼­°üÀ¨£ºand or + - &&
||µÈ·ûºÅ£¬¶øÇÒ»¹ÓÐ"¶ÌÓï²éѯ"ºÍÕë¶ÔÎ÷ÎĵÄǰ׺/Ä£ºý²éѯµÈ£¬¸öÈ˸оõ¶ÔÓÚÒ»°ãÓ¦ÓÃÀ´Ëµ£¬ÕâЩ¹¦ÄÜÓÐһЩ»ª¶ø²»Êµ£¬ÆäʵÄܹ»ÊµÏÖÄ¿Ç°ÀàËÆÓÚGOOGLEµÄ²éѯÓï¾ä·ÖÎö¹¦ÄÜÆäʵ¶ÔÓÚ´ó¶àÊýÓû§À´ËµÒѾ­¹»ÁË¡£ËùÒÔ£¬LuceneÔçÆÚ°æ±¾µÄQueryParserÈÔÊDZȽϺõÄÑ¡Ôñ¡£

Ìí¼ÓÐÞ¸Äɾ³ýÖ¸¶¨¼Ç¼£¨Document£©

LuceneÌṩÁËË÷ÒýµÄÀ©Õ¹»úÖÆ£¬Òò´ËË÷ÒýµÄ¶¯Ì¬À©Õ¹Ó¦¸ÃÊÇûÓÐÎÊÌâµÄ£¬¶øÖ¸¶¨¼Ç¼µÄÐÞ¸ÄÒ²ËƺõÖ»ÄÜͨ¹ý¼Ç¼µÄɾ³ý£¬È»ºóÖØмÓÈëʵÏÖ¡£ÈçºÎɾ³ýÖ¸¶¨µÄ¼Ç¼ÄØ£¿É¾³ýµÄ·½·¨Ò²ºÜ¼òµ¥£¬Ö»ÊÇÐèÒªÔÚË÷Òýʱ¸ù¾ÝÊý¾ÝÔ´ÖеļǼIDרÃÅÁí½¨Ë÷Òý£¬È»ºóÀûÓÃIndexReader.delete(Term
term)·½·¨Í¨¹ýÕâ¸ö¼Ç¼IDɾ³ýÏàÓ¦µÄDocument¡£

¸ù¾Ýij¸ö×Ö¶ÎÖµµÄÅÅÐò¹¦ÄÜ

luceneȱʡÊÇ°´ÕÕ×Ô¼ºµÄÏà¹Ø¶ÈËã·¨£¨score£©½øÐнá¹ûÅÅÐòµÄ£¬µ«Äܹ»¸ù¾ÝÆäËû×ֶνøÐнá¹ûÅÅÐòÊÇÒ»¸öÔÚLUCENEµÄ¿ª·¢ÓʼþÁбíÖо­³£Ìáµ½µÄÎÊÌ⣬ºÜ¶àÔ­ÏÈ»ùÓÚÊý¾Ý¿âÓ¦Óö¼ÐèÒª³ýÁË»ùÓÚÆ¥Åä¶È£¨score£©ÒÔÍâµÄÅÅÐò¹¦ÄÜ¡£¶ø´ÓÈ«ÎļìË÷µÄÔ­ÀíÎÒÃÇ¿ÉÒÔÁ˽⵽£¬Èκβ»»ùÓÚË÷ÒýµÄËÑË÷¹ý³ÌЧÂʶ¼»áµ¼ÖÂЧÂʷdz£µÄµÍ£¬Èç¹û»ùÓÚÆäËû×ֶεÄÅÅÐòÐèÒªÔÚËÑË÷¹ý³ÌÖзÃÎÊ´æ´¢×ֶΣ¬ËٶȻشó´ó½µµÍ£¬Òò´Ë·Ç³£ÊDz»¿ÉÈ¡µÄ¡£

µ«ÕâÀïÒ²ÓÐÒ»¸öÕÛÖеĽâ¾ö·½·¨£ºÔÚËÑË÷¹ý³ÌÖÐÄܹ»Ó°ÏìÅÅÐò½á¹ûµÄÖ»ÓÐË÷ÒýÖÐÒѾ­´æ´¢µÄdocIDºÍscoreÕâ2¸ö²ÎÊý£¬ËùÒÔ£¬»ùÓÚscoreÒÔÍâµÄÅÅÐò£¬Æäʵ¿ÉÒÔͨ¹ý½«Êý¾ÝÔ´Ô¤ÏÈÅźÃÐò£¬È»ºó¸ù¾ÝdocID½øÐÐÅÅÐòÀ´ÊµÏÖ¡£ÕâÑù¾Í±ÜÃâÁËÔÚLUCENEËÑË÷½á¹ûÍâ¶Ô½á¹ûÔٴνøÐÐÅÅÐòºÍÔÚËÑË÷¹ý³ÌÖзÃÎʲ»ÔÚË÷ÒýÖеÄij¸ö×Ö¶ÎÖµ¡£

ÕâÀïÐèÒªÐ޸ĵÄÊÇIndexSearcherÖеÄHitCollector¹ý³Ì£º

...
¡¡scorer.score(new HitCollector() {
private float minScore = 0.0f;
public final void collect(int doc, float score) {
if (score > 0.0f && // ignore zeroed buckets
(bits==null || bits.get(doc))) { // skip docs
not in bits
totalHits[0]++;
if (score >= minScore) {
/*
Ô­ÏÈ£ºLucene½«docIDºÍÏàÓ¦µÄÆ¥Åä¶ÈscoreÀýÈë½á¹ûÃüÖÐÁбíÖУº
* hq.put(new ScoreDoc(doc, score)); //
update hit queue
* Èç¹ûÓÃdoc »ò 1/doc ´úÌæ
score£¬¾ÍʵÏÖÁ˸ù¾ÝdocID˳ÅÅ»òÄæÅÅ
*
¼ÙÉèÊý¾ÝÔ´Ë÷ÒýʱÒѾ­°´ÕÕij¸ö×Ö¶ÎÅźÃÁËÐò£¬¶ø½á¹û¸ù¾ÝdocIDÅÅÐòÒ²¾ÍʵÏÖÁË
*
Õë¶Ôij¸ö×ֶεÄÅÅÐò£¬ÉõÖÁ¿ÉÒÔʵÏÖ¸ü¸´ÔÓµÄscoreºÍdocIDµÄÄâºÏ¡£
*/
hq.put(new ScoreDoc(doc, (float) 1/doc
));
if (hq.size() > nDocs) { // if hit queue
overfull
hq.pop(); // remove lowest in hit queue
minScore = ((ScoreDoc)hq.top()).score; // reset
minScore
}
}
}
}
}, reader.maxDoc());
¸üͨÓõÄÊäÈëÊä³ö½Ó¿Ú

ËäÈ»luceneûÓж¨ÒåÒ»¸öÈ·¶¨µÄÊäÈëÎĵµ¸ñʽ£¬µ«Ô½À´Ô½¶àµÄÈËÏ뵽ʹÓÃÒ»¸ö±ê×¼µÄÖмä¸ñʽ×÷ΪLuceneµÄÊý¾Ýµ¼Èë½Ó¿Ú£¬È»ºóÆäËûÊý¾Ý£¬±ÈÈçPDFÖ»ÐèҪͨ¹ý½âÎöÆ÷ת»»³É±ê×¼µÄÖмä¸ñʽ¾Í¿ÉÒÔ½øÐÐÊý¾ÝË÷ÒýÁË¡£Õâ¸öÖмä¸ñʽÖ÷ÒªÒÔXMLΪÖ÷£¬ÀàËÆʵÏÖÒѾ­²»ÏÂ4£¬5¸ö£º

Êý¾ÝÔ´: WORD PDF HTML
DB
\ | |
| /
XMLÖмä¸ñʽ
|
Lucene INDEX

¡¡

´ÓLuceneѧµ½¸ü¶à

LueneµÄÈ·ÊÇÒ»¸öÃæ¶Ô¶ÔÏóÉè¼ÆµÄµä·¶

ËùÓеÄÎÊÌⶼͨ¹ýÒ»¸ö¶îÍâ³éÏó²ãÀ´·½±ãÒÔºóµÄÀ©Õ¹ºÍÖØÓãºÄã¿ÉÒÔͨ¹ýÖØÐÂʵÏÖÀ´´ïµ½×Ô¼ºµÄÄ¿µÄ£¬¶ø¶ÔÆäËûÄ£¿é¶ø²»ÐèÒª£»

¼òµ¥µÄÓ¦ÓÃÈë¿ÚSearcher,
Indexer£¬²¢µ÷ÓõײãһϵÁÐ×é¼þЭͬµÄÍê³ÉËÑË÷ÈÎÎñ£»
ËùÓеĶÔÏóµÄÈÎÎñ¶¼·Ç³£×¨Ò»£º±ÈÈçËÑË÷¹ý³Ì£ºQueryParser·ÖÎö½«²éѯÓï¾äת»»³ÉһϵÁеľ«È·²éѯµÄ×éºÏ(Query),
ͨ¹ýµ×²ãµÄË÷Òý¶ÁÈ¡½á¹¹IndexReader½øÐÐË÷ÒýµÄ¶ÁÈ¡£¬²¢ÓÃÏàÓ¦µÄ´ò·ÖÆ÷¸øËÑË÷½á¹û½øÐдò·Ö/ÅÅÐòµÈ¡£×îºóÖ»½«×îÇ°ÃæµÄÍ·100Ìõ½á¹û·Åµ½½á¹û¼¯»º´æÖУ¬ÖªµÀÓÐÐèÒª¶ÁÈ¡¸üºóÃæµÄ½á¹ûʱ¡£ÓÉÓÚËùÓеŦÄÜÄ£¿éÔ­×Ó»¯³Ì¶È·Ç³£¸ß£¬Òò´Ë¿ÉÒÔͨ¹ýÖØÐÂʵÏÖ¶ø²»ÐèÒªÐÞ¸ÄÆäËû³ÌÐò¡£

³ýÁËÁé»îµÄÓ¦ÓýӿÚÉè¼Æ£¬Lucene»¹ÌṩÁËһЩÊʺϴó¶àÊýÓ¦ÓõÄÓïÑÔ·ÖÎöÆ÷ʵÏÖ£¨SimpleAnalyser,
StandardAnalyser£©£¬ÕâÒ²ÊÇÐÂÓû§Äܹ»ºÜ¿ìÉÏÊÖµÄÖØÒªÔ­ÒòÖ®Ò»¡£

ÕâЩÓŵ㶼ÊǷdz£ÖµµÃÔÚÒÔºóµÄ¿ª·¢ÖÐѧϰ½è¼øµÄ¡£×÷Ϊһ¸öͨÓù¤¾ß¿â£¬LuneceµÄÈ·¸øÓèÁËÐèÒª½«È«ÎļìË÷¹¦ÄÜǶÈëµ½Ó¦ÓÃÖеĿª·¢ÕߺܶàµÄ±ãÀû¡£

¡¡

¡¡

²Î¿¼×ÊÁÏ£º

Apache: Lucene Project
http://jakarta.apache.org/Lucene/
LuceneÓʼþÁбí¹éµµ
Lucene-dev@jakarta.apache.org
Lucene-user@jakarta.apache.org

The Lucene search engine: Powerful, flexible, and free
http://www.javaworld.com/javaworld/jw-09-2000/jw-0915-Lucene_p.html

ÖÐÎÄÓïÑÔµÄÇзִÊ

Lucene Tutorial
http://www.darksleep.com/puff/lucene/lucene.html

Notes on distributed searching with Lucene
http://home.clara.net/markharwood/lucene/

ËÑË÷ÒýÇ湤¾ß½éÉÜ
http://searchtools.com/

ËÑË÷ÒýÇæÐÐÒµÑо¿
http://www.searchenginewatch.com/

¡¡

<<·µ»Ø


_________________________________________________________
Do You Yahoo!?
ÐÂÏʵ½µ×,ÓéÀÖµ½¼Ò - ÑÅ»¢ÍƳöÃâ·ÑÓéÀÖµç×ÓÖܱ¨!
http://cn.ent.yahoo.com/newsletter/index.html

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Lucene introduction in Chinese [ In reply to ]
Thank you for this.
I think we should add this to the contribution page or some other place
on the Lucene site (I'll take a look in a bit).
I would like to just add a link to it.

Note: the link to Lucene's home page at the bottom of the page is
wrong: http://jakarta.apache.org/Lucene/
should be
http://jakarta.apache.org/lucene/

Thanks,
Otis


--- "Che, Dong" <chedong@yahoo.com> wrote:
> http://www.chedong.com/tech/lucene.html
>
> ÔÚÓ¦ÓÃÖмÓÈëÈ«ÎļìË÷¹¦ÄÜ
> ¡ª¡ª»ùÓÚJAVAµÄÈ«ÎÄË÷ÒýÒýÇæLucene¼ò½é
>
> ×÷Õߣº ³µ¶« chedong@bigfoot.com
>
> ×îºó¸üУº2002-08-11 02:08:46
>
> °æȨÉùÃ÷£º¿ÉÒÔÈÎÒâתÔØ£¬×ªÔØʱÇëÎñ±Ø±êÃ÷ԭʼ³ö´¦ºÍ×÷ÕßÐÅÏ¢
>
> ¹Ø¼ü´Ê£ºLucene full-text search engine Chinese word
> segment
>
> ÕªÒª£ºLuceneÊÇÒ»¸ö»ùÓÚJAVAµÄÈ«ÎÄË÷Òý¹¤¾ß°ü¡£
>
> »ùÓÚJAVAµÄÈ«ÎÄË÷ÒýÒýÇæLucene¼ò½é£º¹ØÓÚ×÷ÕߺÍLuceneµÄÀúÊ·
>
> È«ÎļìË÷µÄʵÏÖ£ºLueneÈ«ÎÄË÷ÒýºÍÊý¾Ý¿âË÷ÒýµÄ±È½Ï
> ÖÐÎÄÇзִʻúÖƼò½é£º»ùÓÚ´Ê¿âºÍ×Ô¶¯ÇзִÊËã·¨µÄ±È½Ï
> ¾ßÌåµÄ°²×°ºÍʹÓüò½é£ºÏµÍ³½á¹¹½éÉܺÍÑÝʾ
> Hacking
> Lucene£º¼ò»¯µÄ²éѯ·ÖÎöÆ÷£¬É¾³ýµÄʵÏÖ£¬¶¨ÖƵÄÅÅÐò£¬Ó¦ÓýӿڵÄÀ©Õ¹
>
> ´ÓLuceneÎÒÃÇ»¹¿ÉÒÔѧµ½Ê²Ã´
> »ùÓÚJAVAµÄÈ«ÎÄË÷Òý/¼ìË÷ÒýÇ桪¡ªLucene
>
>
Lucene²»ÊÇÒ»¸öÍêÕûµÄÈ«ÎÄË÷ÒýÓ¦Ó㬶øÊÇÊÇÒ»¸öÓÃJAVAдµÄÈ«ÎÄË÷ÒýÒýÇ湤¾ß°ü£¬Ëü¿ÉÒÔ·½±ãµÄǶÈëµ½¸÷ÖÖÓ¦ÓÃÖÐʵÏÖÕë¶ÔÓ¦ÓõÄÈ«ÎÄË÷Òý/¼ìË÷¹¦ÄÜ¡£
>
> LuceneµÄ×÷ÕߣºLuceneµÄ¹±Ï×ÕßDoug
>
CuttingÊÇһλ×ÊÉîÈ«ÎÄË÷Òý/¼ìË÷ר¼Ò£¬Ôø¾­ÊÇV-TwinËÑË÷ÒýÇæ(AppleµÄCopland²Ù×÷ϵͳµÄ³É¾ÍÖ®Ò»)µÄÖ÷Òª¿ª·¢Õߣ¬ºóÔÚExciteµ£Èθ߼¶ÏµÍ³¼Ü¹¹Éè¼Æʦ£¬Ä¿Ç°´ÓÊÂÓÚһЩINTERNETµ×²ã¼Ü¹¹µÄÑо¿¡£Ëû¹±Ï׳öµÄLuceneµÄÄ¿±êÊÇΪ¸÷ÖÖÖÐСÐÍÓ¦ÓóÌÐò¼ÓÈëÈ«ÎļìË÷¹¦ÄÜ¡£
>
>
LuceneµÄ·¢Õ¹Àú³Ì£ºÔçÏÈ·¢²¼ÔÚ×÷Õß×Ô¼ºµÄwww.Lucene.com£¬ºóÀ´ËÞÖ÷ÔÚSOURCEFORGE£¬2001ÄêÄêµ×³ÉΪAPACHE»ù½ð»ájakartaµÄÒ»¸ö×ÓÏîÄ¿£ºhttp://jakarta.apache.org/Lucene/
>
> »ùÓÚLuceneµÄÓ¦Óãº
>
ÒѾ­ÓкܶàJAVAÏîÄ¿¶¼Ê¹ÓÃÁËLucene×÷ΪÆäºǫ́µÄÈ«ÎÄË÷ÒýÒýÇ棬±È½ÏÖøÃûµÄÓУº
>
> JIVE£ºWEBÂÛ̳ϵͳ£»
> Eyebrows£ºÓʼþÁбíHTML¹éµµ/ä¯ÀÀ/²éѯϵͳ£¬±¾ÎĵÄÖ÷Òª²Î¿¼Îĵµ¡°The
> Lucene search engine: Powerful, flexible, and
>
free¡±×÷Õß¾ÍÊÇEyeBrowsϵͳµÄÖ÷Òª¿ª·¢ÕßÖ®Ò»£¬¶øEyeBrowsÒѾ­³ÉΪĿǰAPACHEÏîÄ¿µÄÖ÷ÒªÓʼþÁбí¹éµµÏµÍ³¡£
>
> Cocoon: »ùÓÚXMLµÄweb·¢²¼¿ò¼Ü£¬È«ÎļìË÷²¿·ÖʹÓÃÁËLUCENE
>
>
¶ÔÓÚÖÐÎÄÓû§À´Ëµ£¬×î¹ØÐĵÄÎÊÌâÊÇÆäÊÇ·ñÖ§³ÖÖÐÎĵÄÈ«ÎļìË÷¡£µ«Í¨¹ýºóÃæ¶ÔÓÚLuceneµÄ½á¹¹µÄ½éÉÜ£¬Äã»áÁ˽⵽ÓÉÓÚLuceneÁ¼ºÃ¼Ü¹¹Éè¼Æ£¬Ö»ÐèһЩ¼òµ¥µÄ½Ó¿ÚÀ©Õ¹¾ÍÄÜʵÏÖ¶ÔÖÐÎļìË÷µÄÖ§³Ö¡£
>
> È«ÎļìË÷µÄʵÏÖ»úÖÆ
>
>
LuceneµÄAPI½Ó¿ÚÉè¼ÆµÄ±È½ÏͨÓã¬ÊäÈëÊä³ö½á¹¹¶¼ºÜÏñÊý¾Ý¿âµÄ±í==>¼Ç¼==>×ֶΣ¬ËùÒԺܶഫͳµÄÓ¦ÓõÄÎļþ¡¢Êý¾Ý¿âµÈ¶¼¿ÉÒԱȽϷ½±ãµÄÓ³Éäµ½LuceneµÄ´æ´¢½á¹¹/½Ó¿ÚÖС£×ÜÌåÉÏ¿´£º¿ÉÒÔÏÈ°ÑLuceneµ±³ÉÒ»¸öÖ§³ÖÈ«ÎÄË÷ÒýµÄÊý¾Ý¿âϵͳ¡£
>
> ±È½ÏÒ»ÏÂLuceneºÍÊý¾Ý¿â£º
>
> Lucene Êý¾Ý¿â
> Ë÷ÒýÊý¾ÝÔ´£ºdoc(field1,field2...)
> doc(field1,field2...)
> \ indexer /
> _____________
> | Lucene Index|
> --------------
> / searcher \
> ½á¹ûÊä³ö£ºHits(doc(field1,field2) doc(field1...))
> Ë÷ÒýÊý¾ÝÔ´£ºrecord(field1,field2...) record(field1..)
> \ SQL: insert/
> _____________
> | DB Index |
> -------------
> / SQL: select \
> ½á¹ûÊä³ö£ºresults(record(field1,field2..)
> record(field1...))
> Document£ºÒ»¸öÐèÒª½øÐÐË÷ÒýµÄ¡°µ¥Ôª¡±
> Ò»¸öDocumentÓɶà¸ö×Ö¶Î×é³É Record£º¼Ç¼£¬°üº¬¶à¸ö×Ö¶Î
> Field£º×ֶΠField£º×Ö¶Î
> Hits£º²éѯ½á¹û¼¯£¬ÓÉÆ¥ÅäµÄDocument×é³É
> RecordSet£º²éѯ½á¹û¼¯£¬Óɶà¸öRecord×é³É
>
> È«ÎļìË÷ ¡Ù like "%keyword%"
>
> ͨ³£±È½ÏºñµÄÊé¼®ºóÃæ³£³£¸½¹Ø¼ü´ÊË÷Òý±í£¨±ÈÈ磺±±¾©£º12,
> 34Ò³£¬ ÉϺ££º3,
>
77Ò³¡­¡­£©£¬ËüÄܹ»°ïÖú¶ÁÕ߱ȽϿìµØÕÒµ½Ïà¹ØÄÚÈݵÄÒ³Âë¡£¶øÊý¾Ý¿âË÷ÒýÄܹ»´ó´óÌá¸ß²éѯµÄËÙ¶ÈÔ­ÀíÒ²ÊÇÒ»Ñù£¬ÏëÏñÒ»ÏÂͨ¹ýÊéºóÃæµÄË÷Òý²éÕÒµÄËÙ¶ÈÒª±ÈÒ»Ò³Ò»Ò³µØ·­ÄÚÈݸ߶àÉÙ±¶¡­¡­¶øË÷ÒýÖ®ËùÒÔЧÂʸߣ¬ÁíÍâÒ»¸öÔ­ÒòÊÇËüÊÇÅźÃÐòµÄ¡£¶ÔÓÚ¼ìË÷ϵͳÀ´ËµºËÐÄÊÇÒ»¸öÅÅÐòÎÊÌâ¡£
>
> ÓÉÓÚÊý¾Ý¿âË÷Òý²»ÊÇΪȫÎÄË÷ÒýÉè¼ÆµÄ£¬Òò´Ë£¬Ê¹ÓÃlike
>
"%keyword%"ʱ£¬Êý¾Ý¿âË÷ÒýÊDz»Æð×÷Óõģ¬ÔÚʹÓÃlike²éѯʱ£¬ËÑË÷¹ý³ÌÓÖ±ä³ÉÀàËÆÓÚÒ»Ò³Ò³·­ÊéµÄ±éÀú¹ý³ÌÁË£¬ËùÒÔ¶ÔÓÚº¬ÓÐÄ£ºý²éѯµÄÊý¾Ý¿â·þÎñÀ´Ëµ£¬LIKE¶ÔÐÔÄܵÄΣº¦ÊǼ«´óµÄ¡£Èç¹ûÊÇÐèÒª¶Ô¶à¸ö¹Ø¼ü´Ê½øÐÐÄ£ºýÆ¥Å䣺like
> "%keyword1%" and like "%keyword2%"
> ...ÆäЧÂÊÒ²¾Í¿ÉÏë¶øÖªÁË¡£
>
>
ËùÒÔ½¨Á¢Ò»¸ö¸ßЧ¼ìË÷ϵͳµÄ¹Ø¼üÊǽ¨Á¢Ò»¸öÀàËÆÓڿƼ¼Ë÷ÒýÒ»ÑùµÄ·´ÏòË÷Òý»úÖÆ£¬½«Êý¾ÝÔ´£¨±ÈÈç¶àƪÎÄÕ£©ÅÅÐò˳Ðò´æ´¢µÄͬʱ£¬ÓÐÁíÍâÒ»¸öÅźÃÐòµÄ¹Ø¼ü´ÊÁÐ±í£¬ÓÃÓÚ´æ´¢¹Ø¼ü´Ê==>ÎÄÕÂÓ³Éä¹Øϵ£¬ÀûÓÃÕâÑùµÄÓ³Éä¹ØϵË÷Òý£º[¹Ø¼ü´Ê==>³öÏֹؼü´ÊµÄÎÄÕ±àºÅ£¬³öÏÖ´ÎÊý£¨ÉõÖÁ°üÀ¨Î»ÖãºÆðʼƫÒÆÁ¿£¬½áÊøÆ«ÒÆÁ¿£©£¬³öÏÖƵÂÊ]£¬¼ìË÷¹ý³Ì¾ÍÊÇ°ÑÄ£ºý²éѯ±ä³É¶à¸ö¿ÉÒÔÀûÓÃË÷ÒýµÄ¾«È·²éѯµÄÂß¼­×éºÏµÄ¹ý³Ì¡£´Ó¶ø´ó´óÌá¸ßÁ˶à¹Ø¼ü´Ê²éѯµÄЧÂÊ£¬ËùÒÔ£¬È«ÎļìË÷ÎÊÌâ¹é½áµ½×îºóÊÇÒ»¸öÅÅÐòÎÊÌâ¡£
>
>
ÓÉ´Ë¿ÉÒÔ¿´³öÄ£ºý²éѯÏà¶ÔÊý¾Ý¿âµÄ¾«È·²éѯÊÇÒ»¸ö·Ç³£²»È·¶¨µÄÎÊÌ⣬ÕâÒ²ÊǴ󲿷ÖÊý¾Ý¿â¶ÔÈ«ÎļìË÷Ö§³ÖÓÐÏÞµÄÔ­Òò¡£Lucene×îºËÐĵÄÌØÕ÷ÊÇͨ¹ýÌØÊâµÄË÷Òý½á¹¹ÊµÏÖÁË´«Í³Êý¾Ý¿â²»Éó¤µÄÈ«ÎÄË÷Òý»úÖÆ£¬²¢ÌṩÁËÀ©Õ¹½Ó¿Ú£¬ÒÔ·½±ãÕë¶Ô²»Í¬Ó¦ÓõĶ¨ÖÆ¡£
>
> ¿ÉÒÔͨ¹ýһϱí¸ñ¶Ô±ÈÒ»ÏÂÊý¾Ý¿âµÄÄ£ºý²éѯ£º
>
> ¡¡ LuceneÈ«ÎÄË÷ÒýÒýÇæ Êý¾Ý¿â
> Ë÷Òý ½«Êý¾ÝÔ´ÖеÄÊý¾Ý¶¼Í¨¹ýÈ«ÎÄË÷ÒýÒ»Ò»½¨Á¢·´ÏòË÷Òý
> ¶ÔÓÚLIKE
>
²éѯÀ´Ëµ£¬Êý¾Ý´«Í³µÄË÷ÒýÊǸù±¾Óò»Éϵġ£Êý¾ÝÐèÒªÖð¸ö±ãÀû¼Ç¼½øÐÐGREPʽµÄÄ£ºýÆ¥Å䣬±ÈÓÐË÷ÒýµÄËÑË÷ËÙ¶ÈÒªÓжà¸öÊýÁ¿¼¶µÄϽµ¡£
>
> Æ¥ÅäЧ¹û
>
ͨ¹ý´ÊÔª(term)½øÐÐÆ¥Å䣬ͨ¹ýÓïÑÔ·ÖÎö½Ó¿ÚµÄʵÏÖ£¬¿ÉÒÔʵÏÖ¶ÔÖÐÎĵȷÇÓ¢ÓïµÄÖ§³Ö¡£
> ʹÓãºlike "%net%" »á°ÑnetherlandsҲƥÅä³öÀ´£¬
> ¶à¸ö¹Ø¼ü´ÊµÄÄ£ºýÆ¥Å䣺ʹÓÃlike
> "%com%net%"£º¾Í²»ÄÜÆ¥Åä´ÊÐòµßµ¹µÄxxx.net..xxx.com
> Æ¥Åä¶È
> ÓÐÆ¥Åä¶ÈËã·¨£¬½«Æ¥Åä³Ì¶È£¨ÏàËƶȣ©±È½Ï¸ßµÄ½á¹ûÅÅÔÚÇ°Ãæ¡£
> ûÓÐÆ¥Åä³Ì¶ÈµÄ¿ØÖÆ£º±ÈÈçÓмǼÖÐnet³öÏÖ5´ÊºÍ³öÏÖ1´ÎµÄ£¬½á¹ûÊÇÒ»ÑùµÄ¡£
>
> ½á¹ûÊä³ö
>
ͨ¹ýÌرðµÄËã·¨£¬½«×îÆ¥Åä¶È×î¸ßµÄÍ·100Ìõ½á¹ûÊä³ö£¬½á¹û¼¯ÊÇ»º³åʽµÄСÅúÁ¿¶ÁÈ¡µÄ¡£
>
·µ»ØËùÓеĽá¹û¼¯£¬ÔÚÆ¥ÅäÌõÄ¿·Ç³£¶àµÄʱºò£¨±ÈÈçÉÏÍòÌõ£©ÐèÒª´óÁ¿µÄÄÚ´æ´æ·ÅÕâЩÁÙʱ½á¹û¼¯¡£
>
> ¿É¶¨ÖÆÐÔ
>
ͨ¹ý²»Í¬µÄÓïÑÔ·ÖÎö½Ó¿ÚʵÏÖ£¬¿ÉÒÔ·½±ãµÄ¶¨ÖƳö·ûºÏÓ¦ÓÃÐèÒªµÄË÷Òý¹æÔò£¨°üÀ¨¶ÔÖÐÎĵÄÖ§³Ö£©
> ûÓнӿڻò½Ó¿Ú¸´ÔÓ£¬ÎÞ·¨¶¨ÖÆ
> ½áÂÛ
> ¸ß¸ºÔصÄÄ£ºý²éѯӦÓã¬ÐèÒª¸ºÔðµÄÄ£ºý²éѯµÄ¹æÔò£¬Ë÷ÒýµÄ×ÊÁÏÁ¿±È½Ï´ó
> ʹÓÃÂʵͣ¬Ä£ºýÆ¥Åä¹æÔò¼òµ¥»òÕßÐèҪģºý²éѯµÄ×ÊÁÏÁ¿ÉÙ
>
> LuceneµÄ´´ÐÂÖ®´¦£º
>
>
´ó²¿·ÖµÄËÑË÷£¨Êý¾Ý¿â£©ÒýÇ涼ÊÇÓÃBÊ÷½á¹¹À´Î¬»¤Ë÷Òý£¬Ë÷ÒýµÄ¸üлᵼÖ´óÁ¿µÄIO²Ù×÷£¬LuceneÔÚʵÏÖÖУ¬¶Ô´ËÉÔ΢ÓÐËù¸Ä½ø£º²»ÊÇά»¤Ò»¸öË÷ÒýÎļþ£¬¶øÊÇÔÚÀ©Õ¹Ë÷ÒýµÄʱºò²»¶Ï´´½¨ÐµÄË÷ÒýÎļþ£¬È»ºó¶¨ÆڵİÑÕâЩеÄСË÷ÒýÎļþºÏ²¢µ½Ô­ÏȵĴóË÷ÒýÖУ¨Õë¶Ô²»Í¬µÄ¸üвßÂÔ£¬Åú´ÎµÄ´óС¿ÉÒÔµ÷Õû£©£¬ÕâÑùÔÚ²»Ó°Ïì¼ìË÷µÄЧÂʵÄÇ°ÌáÏ£¬Ìá¸ßÁËË÷ÒýµÄЧÂÊ¡£
>
> LuceneºÍÆäËûһЩȫÎļìË÷ϵͳ/Ó¦ÓõıȽϣº
>
> ¡¡ Lucene ÆäËû¿ªÔ´È«ÎļìË÷ϵͳ
> ÔöÁ¿Ë÷ÒýºÍÅúÁ¿Ë÷Òý
>
¿ÉÒÔ½øÐÐÔöÁ¿µÄË÷Òý(Append)£¬¿ÉÒÔ¶ÔÓÚ´óÁ¿Êý¾Ý½øÐÐÅúÁ¿Ë÷Òý£¬²¢ÇÒ½Ó¿ÚÉè¼ÆÓÃÓÚÓÅ»¯ÅúÁ¿Ë÷ÒýºÍСÅúÁ¿µÄÔöÁ¿Ë÷Òý¡£
> ºÜ¶àϵͳֻ֧³ÖÅúÁ¿µÄË÷Òý£¬ÓÐʱÊý¾ÝÔ´ÓÐÒ»µãÔö¼ÓÒ²ÐèÒªÖؽ¨Ë÷Òý¡£
>
> Êý¾ÝÔ´
>
LuceneûÓж¨Òå¾ßÌåµÄÊý¾ÝÔ´£¬¶øÊÇÒ»¸öÎĵµµÄ½á¹¹£¬Òò´Ë¿ÉÒԷdz£Áé»îµÄÊÊÓ¦¸÷ÖÖÓ¦Óã¨Ö»ÒªÇ°¶ËÓкÏÊʵÄת»»Æ÷°ÑÊý¾ÝԴת»»³ÉÏàÓ¦½á¹¹£©£¬
> ºÜ¶àϵͳֻÕë¶ÔÍøÒ³£¬È±·¦ÆäËû¸ñʽÎĵµµÄÁé»îÐÔ¡£
> ÄÚÈÝ·Ö¸î
> LuceneµÄÎĵµÊÇÓɶà¸ö×Ö¶Î×é³ÉµÄ£¬ÉõÖÁ¿ÉÒÔ¿ØÖÆÄÇЩ×Ö¶ÎÐèÒªË÷Òý£¬
> ÄÇЩ×ֶβ»ÐèÒªË÷Òý£¬½üÒ»²½Ë÷ÒýµÄ×Ö¶ÎÒ²·Ö£º
> ÐèÒª½øÐзִʵÄË÷Òý£¬±ÈÈ磺±êÌ⣬ÎÄÕÂÄÚÈÝ×Ö¶Î
> ²»ÐèÒª½øÐзִʵÄË÷Òý£¬±ÈÈ磺×÷Õß/ÈÕÆÚ×Ö¶Î
> ȱ·¦Í¨ÓÃÐÔ£¬ÍùÍù½«ÎĵµÕû¸öË÷ÒýÁË
> ÓïÑÔ·ÖÎö ͨ¹ýÓïÑÔ·ÖÎöÆ÷µÄ²»Í¬À©Õ¹ÊµÏÖ£º
> ¿ÉÒÔ¹ýÂ˵ô²»ÐèÒªµÄ´Ê£ºan the of µÈ£¬
> Î÷ÎÄÓï·¨·ÖÎö£º½«jumps jumped
> jumper¶¼¹é½á³Éjump½øÐÐË÷Òý/¼ìË÷
> ·ÇÓ¢ÎÄÖ§³Ö£º¶ÔÑÇÖÞÓïÑÔ£¬°¢À­²®ÓïÑÔµÄË÷ÒýÖ§³Ö
> ȱ·¦Í¨ÓýӿÚʵÏÖ
> ²éѯ·ÖÎö
> ͨ¹ý²éѯ·ÖÎö½Ó¿ÚµÄʵÏÖ£¬¿ÉÒÔ¶¨ÖÆ×Ô¼ºµÄ²éѯÓï·¨¹æÔò£º
> ±ÈÈ磺 ¶à¸ö¹Ø¼ü´ÊÖ®¼äµÄ + - and or¹ØϵµÈ ¡¡
> ²¢·¢·ÃÎÊ Äܹ»Ö§³Ö¶àÓû§µÄʹÓà ¡¡
>
> ¡¡
>
> ¹ØÓÚÑÇÖÞÓïÑԵĵÄÇзִÊÎÊÌâ(Word Segment)
>
>
¶ÔÓÚÖÐÎÄÀ´Ëµ£¬È«ÎÄË÷ÒýÊ×ÏÈ»¹Òª½â¾öÒ»¸öÓïÑÔ·ÖÎöµÄÎÊÌ⣬¶ÔÓÚÓ¢ÎÄÀ´Ëµ£¬Óï¾äÖе¥´ÊÖ®¼äÊÇÌìȻͨ¹ý¿Õ¸ñ·Ö¿ªµÄ£¬µ«ÑÇÖÞÓïÑÔµÄÖÐÈÕº«ÎÄÓï¾äÖеÄ×ÖÊÇÒ»¸ö×Ö°¤Ò»¸ö£¬ËùÓУ¬Ê×ÏÈÒª°ÑÓï¾äÖа´¡°´Ê¡±½øÐÐË÷ÒýµÄ»°£¬Õâ¸ö´ÊÈçºÎÇзֳöÀ´¾ÍÊÇÒ»¸öºÜ´óµÄÎÊÌâ¡£
>
>
Ê×ÏÈ£¬¿Ï¶¨²»ÄÜÓõ¥¸ö×Ö·û×÷(si-gram)ΪË÷Òýµ¥Ôª£¬·ñÔò²é¡°ÉϺ£¡±Ê±£¬²»ÄÜÈú¬ÓС°º£ÉÏ¡±Ò²Æ¥Åä¡£
>
> µ«Ò»¾ä»°£º¡°±±¾©Ìì°²ÃÅ¡±£¬¼ÆËã»úÈçºÎ°´ÕÕÖÐÎĵÄÓïÑÔÏ°¹ß½øÐÐÇзÖÄØ£¿
> ¡°±±¾© Ìì°²ÃÅ¡± »¹ÊÇ¡°±± ¾© Ìì°²
>
ÃÅ¡±£¿ÈüÆËã»úÄܹ»°´ÕÕÓïÑÔÏ°¹ß½øÐÐÇз֣¬ÍùÍùÐèÒª»úÆ÷ÓÐÒ»¸ö±È½Ï·á¸»µÄ´Ê¿â²ÅÄܹ»±È½Ï׼ȷµÄʶ±ð³öÓï¾äÖеĵ¥´Ê¡£
>
>
ÁíÍâÒ»¸ö½â¾öµÄ°ì·¨ÊDzÉÓÃ×Ô¶¯ÇзÖËã·¨£º½«µ¥´Ê°´ÕÕ2ÔªÓï·¨(bigram)·½Ê½ÇзֳöÀ´£¬±ÈÈ磺
> "±±¾©Ìì°²ÃÅ" ==> "±±¾© ¾©Ìì Ìì°² °²ÃÅ"¡£
>
> ÕâÑù£¬ÔÚ²éѯµÄʱºò£¬ÎÞÂÛÊDzéѯ"±±¾©"
> »¹ÊDzéѯ"Ìì°²ÃÅ"£¬½«²éѯ´Ê×鰴ͬÑùµÄ¹æÔò½øÐÐÇз֣º"±±¾©"£¬"Ìì°²
>
°²ÃÅ"£¬¶à¸ö¹Ø¼ü´ÊÖ®¼ä°´Óë"and"µÄ¹Øϵ×éºÏ£¬Í¬ÑùÄܹ»ÕýÈ·µØÓ³Éäµ½ÏàÓ¦µÄË÷ÒýÖС£ÕâÖÖ·½Ê½¶ÔÓÚÆäËûÑÇÖÞÓïÑÔ£ºº«ÎÄ£¬ÈÕÎĶ¼ÊÇͨÓõġ£
>
>
»ùÓÚ×Ô¶¯ÇзֵÄ×î´óÓŵãÊÇûÓдʱíά»¤³É±¾£¬ÊµÏÖ¼òµ¥£¬È±µãÊÇË÷ÒýЧÂʵͣ¬µ«¶ÔÓÚÖÐСÐÍÓ¦ÓÃÀ´Ëµ£¬»ùÓÚ2ÔªÓï·¨µÄÇзֻ¹Êǹ»Óõġ£
>
> ×Ô¶¯ÇÐ·Ö ´Ê±íÇзÖ
> ʵÏÖ ÊµÏַdz£¼òµ¥ ʵÏÖ¸´ÔÓ
> ²éѯ Ôö¼ÓÁ˲éѯ·ÖÎöµÄ¸´Ôӳ̶ȣ¬
> ÊÊÓÚʵÏֱȽϸ´ÔӵIJéѯÓï·¨¹æÔò
> ´æ´¢Ð§ÂÊ Ë÷ÒýÈßÓà´ó£¬Ë÷Òý¼¸ºõºÍÔ­ÎÄÒ»Ñù´ó
> Ë÷ÒýЧÂʸߣ¬ÎªÔ­ÎÄ´óСµÄ30£¥×óÓÒ
> ά»¤³É±¾ Î޴ʱíά»¤³É±¾
> ´Ê±íά»¤³É±¾·Ç³£¸ß£ºÖÐÈÕº«µÈÓïÑÔÐèÒª·Ö±ðά»¤¡£
> »¹ÐèÒª°üÀ¨´ÊƵͳ¼ÆµÈÄÚÈÝ
> ÊÊÓÃÁìÓò ǶÈëʽϵͳ£ºÔËÐл·¾³×ÊÔ´ÓÐÏÞ
> ·Ö²¼Ê½ÏµÍ³£ºÎ޴ʱíͬ²½ÎÊÌâ
> ¶àÓïÑÔ»·¾³£ºÎ޴ʱíά»¤³É±¾
> ¶Ô²éѯºÍ´æ´¢Ð§ÂÊÒªÇó¸ßµÄרҵËÑË÷ÒýÇæ
>
>
>
Ä¿Ç°±È½Ï´óµÄËÑË÷ÒýÇæµÄÓïÑÔ·ÖÎöËã·¨Ò»°ãÊÇ»ùÓÚÒÔÉÏ2¸ö»úÖƵĽáºÏ¡£¹ØÓÚÖÐÎĵÄÓïÑÔ·ÖÎöËã·¨£¬´ó¼Ò¿ÉÒÔÔÚGOOGLE²é¹Ø¼ü´Ê"word
> segment search"ÄÜÕÒµ½¸ü¶àÏà¹ØµÄ×ÊÁÏ¡£
>
> °²×°ºÍʹÓÃ
>
> ÏÂÔØ£ºhttp://jakarta.apache.org/Lucene/
>
> ×¢Ò⣺LuceneÖеÄһЩ±È½Ï¸´ÔӵĴʷ¨·ÖÎöÊÇÓÃJavaCCÉú³ÉµÄ£¨JavaCC£ºJava
> Compiler
>
Compiler£¬´¿JAVAµÄ´Ê·¨·ÖÎöÉú³ÉÆ÷£©£¬ËùÒÔÈç¹û´ÓÔ´´úÂë±àÒë»òÐèÒªÐÞ¸ÄÆäÖеÄQueryParser¡¢¶¨ÖÆ×Ô¼ºµÄ´Ê·¨·ÖÎöÆ÷£¬»¹ÐèÒª´Óhttp://www.webgain.com/products/java_cc/ÏÂÔØjavacc¡£
>
>
luceneµÄ×é³É½á¹¹£º¶ÔÓÚÍⲿӦÓÃÀ´ËµË÷ÒýÄ£¿é(index)ºÍ¼ìË÷Ä£¿é(search)ÊÇÖ÷ÒªµÄÍⲿӦÓÃÈë¿Ú
>
> org.apache.Lucene.search/ ËÑË÷Èë¿Ú
> org.apache.Lucene.index/ Ë÷ÒýÈë¿Ú
> org.apache.Lucene.analysis/ ÓïÑÔ·ÖÎöÆ÷
> org.apache.Lucene.queryParser/ ²éѯ·ÖÎöÆ÷
> org.apache.Lucene.document/ ´æ´¢½á¹¹
> org.apache.Lucene.store/ µ×²ãIO/´æ´¢½á¹¹
> org.apache.Lucene.util/ һЩ¹«ÓõÄÊý¾Ý½á¹¹
>
> ¼òµ¥µÄÀý×ÓÑÝʾһÏÂLuceneµÄʹÓ÷½·¨£º
>
>
Ë÷Òý¹ý³Ì£º´ÓÃüÁîÐжÁÈ¡ÎļþÃû£¨¶à¸ö£©£¬½«Îļþ·Ö·¾¶(path×Ö¶Î)ºÍÄÚÈÝ(body×Ö¶Î)2¸ö×ֶνøÐд洢£¬²¢¶ÔÄÚÈݽøÐÐÈ«ÎÄË÷Òý£ºË÷ÒýµÄµ¥Î»ÊÇDocument¶ÔÏó£¬Ã¿¸öDocument¶ÔÏó°üº¬¶à¸ö×Ö¶ÎField¶ÔÏó£¬Õë¶Ô²»Í¬µÄ×Ö¶ÎÊôÐÔºÍÊý¾ÝÊä³öµÄÐèÇ󣬶Ô×ֶλ¹¿ÉÒÔÑ¡Ôñ²»Í¬µÄË÷Òý/´æ´¢×ֶιæÔò£¬ÁбíÈçÏ£º
> ·½·¨ ÇÐ´Ê Ë÷Òý ´æ´¢ ÓÃ;
> Field.Text(String name, String value) Yes Yes Yes
> ÇзִÊË÷Òý²¢´æ´¢£¬±ÈÈ磺±êÌ⣬ÄÚÈÝ×Ö¶Î
> Field.Text(String name, Reader value) Yes Yes No
> ÇзִÊË÷Òý²»´æ´¢£¬±ÈÈ磺METAÐÅÏ¢£¬
> ²»ÓÃÓÚ·µ»ØÏÔʾ£¬µ«ÐèÒª½øÐмìË÷ÄÚÈÝ
> Field.Keyword(String name, String value) No Yes Yes
> ²»ÇзÖË÷Òý²¢´æ´¢£¬±ÈÈ磺ÈÕÆÚ×Ö¶Î
> Field.UnIndexed(String name, String value) No No Yes
> ²»Ë÷Òý£¬Ö»´æ´¢£¬±ÈÈ磺Îļþ·¾¶
> Field.UnStored(String name, String value) Yes Yes No
> ֻȫÎÄË÷Òý£¬²»´æ´¢
>
> public class IndexFiles {
> //ʹÓ÷½·¨£º: IndexFiles [Ë÷ÒýÊä³öĿ¼]
> [Ë÷ÒýµÄÎļþÁбí] ...
> public static void main(String[] args) throws
> Exception {
> String indexPath = args[0];
> IndexWriter writer;
>
>
//ÓÃÖ¸¶¨µÄÓïÑÔ·ÖÎöÆ÷¹¹ÔìÒ»¸öеÄдË÷ÒýÆ÷£¨µÚ3¸ö²ÎÊý±íʾÊÇ·ñΪ׷¼ÓË÷Òý£©
> writer = new IndexWriter(indexPath, new
> SimpleAnalyzer(), false);
>
> for (int i=1; i<args.length; i++) {
> System.out.println("Indexing file " + args[i]);
> InputStream is = new FileInputStream(args[i]);
>
> //¹¹Ôì°üº¬2¸ö×Ö¶ÎFieldµÄDocument¶ÔÏó
> //Ò»¸öÊÇ·¾¶path×ֶΣ¬²»Ë÷Òý£¬Ö»´æ´¢
> //Ò»¸öÊÇÄÚÈÝbody×ֶΣ¬½øÐÐÈ«ÎÄË÷Òý£¬²¢´æ´¢
> Document doc = new Document();
> doc.add(Field.UnIndexed("path", args[i]));
> doc.add(Field.Text("body", (Reader) new
> InputStreamReader(is)));
> //½«ÎĵµÐ´ÈëË÷Òý
> writer.addDocument(doc);
> is.close();
> };
> //¹Ø±ÕдË÷ÒýÆ÷
> writer.close();
> }
> }
> ¡¡
> Ë÷Òý¹ý³ÌÖпÉÒÔ¿´µ½£º
>
>
ÓïÑÔ·ÖÎöÆ÷ÌṩÁ˳éÏóµÄ½Ó¿Ú£¬Òò´ËÓïÑÔ·ÖÎö(Analyser)ÊÇ¿ÉÒÔ¶¨ÖƵģ¬ËäÈ»luceneȱʡÌṩÁË2¸ö±È½ÏͨÓõķÖÎöÆ÷SimpleAnalyserºÍStandardAnalyser£¬Õâ2¸ö·ÖÎöÆ÷ȱʡ¶¼²»Ö§³ÖÖÐÎÄ£¬ËùÒÔÒª¼ÓÈë¶ÔÖÐÎÄÓïÑÔµÄÇзֹæÔò£¬ÐèÒªÐÞ¸ÄÕâ2¸ö·ÖÎöÆ÷¡£
>
>
Lucene²¢Ã»Óй涨Êý¾ÝÔ´µÄ¸ñʽ£¬¶øÖ»ÌṩÁËÒ»¸öͨÓõĽṹ£¨Document¶ÔÏó£©À´½ÓÊÜË÷ÒýµÄÊäÈ룬Òò´ËÊäÈëµÄÊý¾ÝÔ´¿ÉÒÔÊÇ£ºÊý¾Ý¿â£¬WORDÎĵµ£¬PDFÎĵµ£¬HTMLÎĵµ¡­¡­Ö»ÒªÄܹ»Éè¼ÆÏàÓ¦µÄ½âÎöת»»Æ÷½«Êý¾ÝÔ´¹¹Ôì³É³ÉDocuement¶ÔÏ󼴿ɽøÐÐË÷Òý¡£
>
>
¶ÔÓÚ´óÅúÁ¿µÄÊý¾ÝË÷Òý£¬»¹¿ÉÒÔͨ¹ýµ÷ÕûIndexerWriteµÄÎļþºÏ²¢ÆµÂÊÊôÐÔ£¨mergeFactor£©À´Ìá¸ßÅúÁ¿Ë÷ÒýµÄЧÂÊ¡£
>
> ¼ìË÷¹ý³ÌºÍ½á¹ûÏÔʾ£º
>
> ËÑË÷½á¹û·µ»ØµÄÊÇHits¶ÔÏ󣬿ÉÒÔͨ¹ýËüÔÙ·ÃÎÊDocument==>FieldÖеÄÄÚÈÝ¡£
>
>
¼ÙÉè¸ù¾Ýbody×ֶνøÐÐÈ«ÎļìË÷£¬¿ÉÒÔ½«²éѯ½á¹ûµÄpath×ֶκÍÏàÓ¦²éѯµÄÆ¥Åä¶È(score)´òÓ¡³öÀ´£¬
>
> public class Search {
> public static void main(String[] args) throws
> Exception {
> String indexPath = args[0], queryString = args[1];
> //Ö¸ÏòË÷ÒýĿ¼µÄËÑË÷Æ÷
> Searcher searcher = new IndexSearcher(indexPath);
> //²éѯ½âÎöÆ÷£ºÊ¹ÓúÍË÷ÒýͬÑùµÄÓïÑÔ·ÖÎöÆ÷
> Query query = QueryParser.parse(queryString,
> "body",
> new SimpleAnalyzer());
> //ËÑË÷½á¹ûʹÓÃHits´æ´¢
> Hits hits = searcher.search(query);
> //ͨ¹ýhits¿ÉÒÔ·ÃÎʵ½ÏàÓ¦×ֶεÄÊý¾ÝºÍ²éѯµÄÆ¥Åä¶È
> for (int i=0; i<hits.length(); i++) {
> System.out.println(hits.doc(i).get("path") + ";
> Score: " +
> hits.score(i));
> };
> }
> }
>
ÔÚÕû¸ö¼ìË÷¹ý³ÌÖУ¬ÓïÑÔ·ÖÎöÆ÷£¬²éѯ·ÖÎöÆ÷£¬ÉõÖÁËÑË÷Æ÷£¨Searcher£©¶¼ÊÇÌṩÁ˳éÏóµÄ½Ó¿Ú£¬¿ÉÒÔ¸ù¾ÝÐèÒª½øÐж¨ÖÆ¡£
> Hacking Lucene
>
> ¼ò»¯µÄ²éѯ·ÖÎöÆ÷
>
>
¸öÈ˸оõlucene³ÉΪJAKARTAÏîÄ¿ºó£¬»­ÔÚÁËÌ«¶àµÄʱ¼äÓÃÓÚµ÷ÊÔÈÕÇ÷¸´ÔÓQueryParser£¬¶øÆäÖд󲿷ÖÊÇ´ó¶àÊýÓû§²¢²»ºÜÊìϤµÄ£¬Ä¿Ç°LUCENEÖ§³ÖµÄÓï·¨£º
>
> Query ::= ( Clause )*
> Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "("
> Query ")" )
>
> ÖмäµÄÂß¼­°üÀ¨£ºand or + - &&
>
||µÈ·ûºÅ£¬¶øÇÒ»¹ÓÐ"¶ÌÓï²éѯ"ºÍÕë¶ÔÎ÷ÎĵÄǰ׺/Ä£ºý²éѯµÈ£¬¸öÈ˸оõ¶ÔÓÚÒ»°ãÓ¦ÓÃÀ´Ëµ£¬ÕâЩ¹¦ÄÜÓÐһЩ»ª¶ø²»Êµ£¬ÆäʵÄܹ»ÊµÏÖÄ¿Ç°ÀàËÆÓÚGOOGLEµÄ²éѯÓï¾ä·ÖÎö¹¦ÄÜÆäʵ¶ÔÓÚ´ó¶àÊýÓû§À´ËµÒѾ­¹»ÁË¡£ËùÒÔ£¬LuceneÔçÆÚ°æ±¾µÄQueryParserÈÔÊDZȽϺõÄÑ¡Ôñ¡£
>
> Ìí¼ÓÐÞ¸Äɾ³ýÖ¸¶¨¼Ç¼£¨Document£©
>
>
LuceneÌṩÁËË÷ÒýµÄÀ©Õ¹»úÖÆ£¬Òò´ËË÷ÒýµÄ¶¯Ì¬À©Õ¹Ó¦¸ÃÊÇûÓÐÎÊÌâµÄ£¬¶øÖ¸¶¨¼Ç¼µÄÐÞ¸ÄÒ²ËƺõÖ»ÄÜͨ¹ý¼Ç¼µÄɾ³ý£¬È»ºóÖØмÓÈëʵÏÖ¡£ÈçºÎɾ³ýÖ¸¶¨µÄ¼Ç¼ÄØ£¿É¾³ýµÄ·½·¨Ò²ºÜ¼òµ¥£¬Ö»ÊÇÐèÒªÔÚË÷Òýʱ¸ù¾ÝÊý¾ÝÔ´ÖеļǼIDרÃÅÁí½¨Ë÷Òý£¬È»ºóÀûÓÃIndexReader.delete(Term
> term)·½·¨Í¨¹ýÕâ¸ö¼Ç¼IDɾ³ýÏàÓ¦µÄDocument¡£
>
> ¸ù¾Ýij¸ö×Ö¶ÎÖµµÄÅÅÐò¹¦ÄÜ
>
>
luceneȱʡÊÇ°´ÕÕ×Ô¼ºµÄÏà¹Ø¶ÈËã·¨£¨score£©½øÐнá¹ûÅÅÐòµÄ£¬µ«Äܹ»¸ù¾ÝÆäËû×ֶνøÐнá¹ûÅÅÐòÊÇÒ»¸öÔÚLUCENEµÄ¿ª·¢ÓʼþÁбíÖо­³£Ìáµ½µÄÎÊÌ⣬ºÜ¶àÔ­ÏÈ»ùÓÚÊý¾Ý¿âÓ¦Óö¼ÐèÒª³ýÁË»ùÓÚÆ¥Åä¶È£¨score£©ÒÔÍâµÄÅÅÐò¹¦ÄÜ¡£¶ø´ÓÈ«ÎļìË÷µÄÔ­ÀíÎÒÃÇ¿ÉÒÔÁ˽⵽£¬Èκβ»»ùÓÚË÷ÒýµÄËÑË÷¹ý³ÌЧÂʶ¼»áµ¼ÖÂЧÂʷdz£µÄµÍ£¬Èç¹û»ùÓÚÆäËû×ֶεÄÅÅÐòÐèÒªÔÚËÑË÷¹ý³ÌÖзÃÎÊ´æ´¢×ֶΣ¬ËٶȻشó´ó½µµÍ£¬Òò´Ë·Ç³£ÊDz»¿ÉÈ¡µÄ¡£
>
>
µ«ÕâÀïÒ²ÓÐÒ»¸öÕÛÖеĽâ¾ö·½·¨£ºÔÚËÑË÷¹ý³ÌÖÐÄܹ»Ó°ÏìÅÅÐò½á¹ûµÄÖ»ÓÐË÷ÒýÖÐÒѾ­´æ´¢µÄdocIDºÍscoreÕâ2¸ö²ÎÊý£¬ËùÒÔ£¬»ùÓÚscoreÒÔÍâµÄÅÅÐò£¬Æäʵ¿ÉÒÔͨ¹ý½«Êý¾ÝÔ´Ô¤ÏÈÅźÃÐò£¬È»ºó¸ù¾ÝdocID½øÐÐÅÅÐòÀ´ÊµÏÖ¡£ÕâÑù¾Í±ÜÃâÁËÔÚLUCENEËÑË÷½á¹ûÍâ¶Ô½á¹ûÔٴνøÐÐÅÅÐòºÍÔÚËÑË÷¹ý³ÌÖзÃÎʲ»ÔÚË÷ÒýÖеÄij¸ö×Ö¶ÎÖµ¡£
>
> ÕâÀïÐèÒªÐ޸ĵÄÊÇIndexSearcherÖеÄHitCollector¹ý³Ì£º
>
> ...
> ¡¡scorer.score(new HitCollector() {
> private float minScore = 0.0f;
> public final void collect(int doc, float score) {
> if (score > 0.0f && // ignore zeroed buckets
> (bits==null || bits.get(doc))) { // skip docs
> not in bits
> totalHits[0]++;
> if (score >= minScore) {
> /*
> Ô­ÏÈ£ºLucene½«docIDºÍÏàÓ¦µÄÆ¥Åä¶ÈscoreÀýÈë½á¹ûÃüÖÐÁбíÖУº
> * hq.put(new ScoreDoc(doc, score)); //
> update hit queue
> * Èç¹ûÓÃdoc »ò 1/doc ´úÌæ
> score£¬¾ÍʵÏÖÁ˸ù¾ÝdocID˳ÅÅ»òÄæÅÅ
> *
>
¼ÙÉèÊý¾ÝÔ´Ë÷ÒýʱÒѾ­°´ÕÕij¸ö×Ö¶ÎÅźÃÁËÐò£¬¶ø½á¹û¸ù¾ÝdocIDÅÅÐòÒ²¾ÍʵÏÖÁË
> *
> Õë¶Ôij¸ö×ֶεÄÅÅÐò£¬ÉõÖÁ¿ÉÒÔʵÏÖ¸ü¸´ÔÓµÄscoreºÍdocIDµÄÄâºÏ¡£
> */
> hq.put(new ScoreDoc(doc, (float) 1/doc
> ));
> if (hq.size() > nDocs) { // if hit queue
> overfull
> hq.pop(); // remove lowest in hit queue
> minScore = ((ScoreDoc)hq.top()).score; // reset
> minScore
> }
> }
> }
> }
> }, reader.maxDoc());
> ¸üͨÓõÄÊäÈëÊä³ö½Ó¿Ú
>
>
ËäÈ»luceneûÓж¨ÒåÒ»¸öÈ·¶¨µÄÊäÈëÎĵµ¸ñʽ£¬µ«Ô½À´Ô½¶àµÄÈËÏ뵽ʹÓÃÒ»¸ö±ê×¼µÄÖмä¸ñʽ×÷ΪLuceneµÄÊý¾Ýµ¼Èë½Ó¿Ú£¬È»ºóÆäËûÊý¾Ý£¬±ÈÈçPDFÖ»ÐèҪͨ¹ý½âÎöÆ÷ת»»³É±ê×¼µÄÖмä¸ñʽ¾Í¿ÉÒÔ½øÐÐÊý¾ÝË÷ÒýÁË¡£Õâ¸öÖмä¸ñʽÖ÷ÒªÒÔXMLΪÖ÷£¬ÀàËÆʵÏÖÒѾ­²»ÏÂ4£¬5¸ö£º
>
> Êý¾ÝÔ´: WORD PDF HTML
> DB
> \ | |
> | /
> XMLÖмä¸ñʽ
> |
> Lucene INDEX
>
> ¡¡
>
> ´ÓLuceneѧµ½¸ü¶à
>
> LueneµÄÈ·ÊÇÒ»¸öÃæ¶Ô¶ÔÏóÉè¼ÆµÄµä·¶
>
>
ËùÓеÄÎÊÌⶼͨ¹ýÒ»¸ö¶îÍâ³éÏó²ãÀ´·½±ãÒÔºóµÄÀ©Õ¹ºÍÖØÓãºÄã¿ÉÒÔͨ¹ýÖØÐÂʵÏÖÀ´´ïµ½×Ô¼ºµÄÄ¿µÄ£¬¶ø¶ÔÆäËûÄ£¿é¶ø²»ÐèÒª£»
>
> ¼òµ¥µÄÓ¦ÓÃÈë¿ÚSearcher,
> Indexer£¬²¢µ÷ÓõײãһϵÁÐ×é¼þЭͬµÄÍê³ÉËÑË÷ÈÎÎñ£»
>
ËùÓеĶÔÏóµÄÈÎÎñ¶¼·Ç³£×¨Ò»£º±ÈÈçËÑË÷¹ý³Ì£ºQueryParser·ÖÎö½«²éѯÓï¾äת»»³ÉһϵÁеľ«È·²éѯµÄ×éºÏ(Query),
>
ͨ¹ýµ×²ãµÄË÷Òý¶ÁÈ¡½á¹¹IndexReader½øÐÐË÷ÒýµÄ¶ÁÈ¡£¬²¢ÓÃÏàÓ¦µÄ´ò·ÖÆ÷¸øËÑË÷½á¹û½øÐдò·Ö/ÅÅÐòµÈ¡£×îºóÖ»½«×îÇ°ÃæµÄÍ·100Ìõ½á¹û·Åµ½½á¹û¼¯»º´æÖУ¬ÖªµÀÓÐÐèÒª¶ÁÈ¡¸üºóÃæµÄ½á¹ûʱ¡£ÓÉÓÚËùÓеŦÄÜÄ£¿éÔ­×Ó»¯³Ì¶È·Ç³£¸ß£¬Òò´Ë¿ÉÒÔͨ¹ýÖØÐÂʵÏÖ¶ø²»ÐèÒªÐÞ¸ÄÆäËû³ÌÐò¡£
>
>
³ýÁËÁé»îµÄÓ¦ÓýӿÚÉè¼Æ£¬Lucene»¹ÌṩÁËһЩÊʺϴó¶àÊýÓ¦ÓõÄÓïÑÔ·ÖÎöÆ÷ʵÏÖ£¨SimpleAnalyser,
> StandardAnalyser£©£¬ÕâÒ²ÊÇÐÂÓû§Äܹ»ºÜ¿ìÉÏÊÖµÄÖØÒªÔ­ÒòÖ®Ò»¡£
>
>
ÕâЩÓŵ㶼ÊǷdz£ÖµµÃÔÚÒÔºóµÄ¿ª·¢ÖÐѧϰ½è¼øµÄ¡£×÷Ϊһ¸öͨÓù¤¾ß¿â£¬LuneceµÄÈ·¸øÓèÁËÐèÒª½«È«ÎļìË÷¹¦ÄÜǶÈëµ½Ó¦ÓÃÖеĿª·¢ÕߺܶàµÄ±ãÀû¡£
>
> ¡¡
>
> ¡¡
>
> ²Î¿¼×ÊÁÏ£º
>
> Apache: Lucene Project
> http://jakarta.apache.org/Lucene/
> LuceneÓʼþÁбí¹éµµ
> Lucene-dev@jakarta.apache.org
> Lucene-user@jakarta.apache.org
>
> The Lucene search engine: Powerful, flexible, and free
> http://www.javaworld.com/javaworld/jw-09-2000/jw-0915-Lucene_p.html
>
> ÖÐÎÄÓïÑÔµÄÇзִÊ
>
> Lucene Tutorial
> http://www.darksleep.com/puff/lucene/lucene.html
>
> Notes on distributed searching with Lucene
> http://home.clara.net/markharwood/lucene/
>
> ËÑË÷ÒýÇ湤¾ß½éÉÜ
> http://searchtools.com/
>
> ËÑË÷ÒýÇæÐÐÒµÑо¿
> http://www.searchenginewatch.com/
>
> ¡¡
>
> <<·µ»Ø
>
>
> _________________________________________________________
> Do You Yahoo!?
> ÐÂÏʵ½µ×,ÓéÀÖµ½¼Ò - ÑÅ»¢ÍƳöÃâ·ÑÓéÀÖµç×ÓÖܱ¨!
> http://cn.ent.yahoo.com/newsletter/index.html
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! Finance - Get real-time stock quotes
http://finance.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Lucene introduction in Chinese [ In reply to ]
Otis Gospodnetic wrote:
> I think we should add this to the contribution page or some other place
> on the Lucene site (I'll take a look in a bit).
> I would like to just add a link to it.

I think we should add this directly to the Lucene site. Lucene strives
to be an internationalized package, and translated documentation is a
big part of internationalization. What do others think?

Perhaps we should even add Che Dong as a Lucene committer so that he can
maintain this, as well as other Asian language support. Thoughts?

Doug


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Lucene introduction in Chinese [ In reply to ]
> Otis Gospodnetic wrote:
> > I think we should add this to the contribution page or some other place
> > on the Lucene site (I'll take a look in a bit).
> > I would like to just add a link to it.
>
> I think we should add this directly to the Lucene site. Lucene strives
> to be an internationalized package, and translated documentation is a
> big part of internationalization. What do others think?
>
> Perhaps we should even add Che Dong as a Lucene committer so that he can
> maintain this, as well as other Asian language support. Thoughts?
>
> Doug

I'm very glad to be a committer of lucene project and document translator if needed.

Che, Dong
Re: Lucene introduction in Chinese [ In reply to ]
--- Doug Cutting <cutting@lucene.com> wrote:
> Otis Gospodnetic wrote:
> > I think we should add this to the contribution page or some other
> place
> > on the Lucene site (I'll take a look in a bit).
> > I would like to just add a link to it.
>
> I think we should add this directly to the Lucene site. Lucene
> strives
> to be an internationalized package, and translated documentation is a
>
> big part of internationalization. What do others think?

Sure. I first wanted to put it as the link on the left side of the
main page, but the name was long, and it's the only link of that kind
that we have, so I put it in resources. Feel free to change it.

> Perhaps we should even add Che Dong as a Lucene committer so that he
> can
> maintain this, as well as other Asian language support. Thoughts?

He's submitted a number of things (classes, code snippets, docs, etc.)
over time, I don't see why not. People who have time, energy and
knowledge are always welcome.
He's got +1 from me, if you want to start counting.

Otis


__________________________________________________
Yahoo! - We Remember
9-11: A tribute to the more than 3,000 lives lost
http://dir.remember.yahoo.com/tribute

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>