新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> 最新的技术动态
    [返回] 中文XML论坛 - 专业的XML技术讨论区休息区『 最新动态 & 业界新闻 』 → Google是一种语义搜索引擎吗? 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 141710 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: Google是一种语义搜索引擎吗? 举报  打印  推荐  IE收藏夹 
       本主题类别: 信息检索 | Semantic Web    
     admin 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      头衔:W3China站长
      等级:计算机硕士学位(管理员)
      文章:5255
      积分:18406
      门派:W3CHINA.ORG
      注册:2003/10/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 最新动态 & 业界新闻 』的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客楼主
    发贴心情 Google是一种语义搜索引擎吗?


    http://www.readwriteweb.com/archives/is_google_a_semantic_search_engine.php

    Is Google a Semantic Search Engine?

    Written by Guest Author   / March 26, 2007 /

    Written by Phill Midwinter, a search engineer from the UK. This is a great follow-up to our article last Friday, [URL=http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php]Hakia Takes On Google With Semantic Technologies[/URL].

    What is a Semantic Engine?
    Semantics are said to be ‘the next big thing’ in search engine technology. We technology bloggers routinely drum up articles about it and sell it to you, the adoring masses, as a product that will change your web experience forever. Problem is, we often forget to tell you exactly what semantics are - we just get so excited. So let's explore this...

    Wikipedia says:

    Semantics ([URL=http://en.wikipedia.org/wiki/Ancient_Greek]Greek按此在新窗口浏览图片[/URL] semantikos, giving signs, significant, symptomatic, from sema, [URL=http://en.wikipedia.org/wiki/Sign]sign按此在新窗口浏览图片[/URL]) refers to the aspects of [URL=http://en.wikipedia.org/wiki/Meaning_(linguistic)]meaning按此在新窗口浏览图片[/URL] that are expressed in a [URL=http://en.wikipedia.org/wiki/Language]language按此在新窗口浏览图片[/URL], [URL=http://en.wikipedia.org/wiki/Code]code按此在新窗口浏览图片[/URL], or other form of representation. Semantics is contrasted with two other aspects of meaningful expression, namely, [URL=http://en.wikipedia.org/wiki/Syntax]syntax按此在新窗口浏览图片[/URL], the construction of complex signs from simpler signs, and [URL=http://en.wikipedia.org/wiki/Pragmatics]pragmatics按此在新窗口浏览图片[/URL], the practical use of signs by [URL=http://en.wikipedia.org/wiki/Agent]agents按此在新窗口浏览图片[/URL] or [URL=http://en.wikipedia.org/wiki/Community]communities按此在新窗口浏览图片[/URL] of interpretation in particular circumstances and contexts. By the usual convention that calls a study or a theory by the name of its subject matter, semantics may also denote the theoretical study of meaning in systems of signs.”

    ...which is absolutely no help.

    Semantics as it relates to our topic, search engines, actually covers a few closely related fields. In this instance what we are looking at deciphering (as a basic example) is whether a computer can discern if there is a link between two words, such as cat and dog. You and I both know that cats and dogs are common household pets, and can be categorized as such. The human brain seems to comprehend this easily, but for a computer it is a much more complex task and one I won’t go into here - because it would most likely bore you.

    If we take as read then, that the search engine now has semantic functionality, how does that enable it to refine its search capability?

    It can automatically place pages into dynamic categories, or tag them without human intervention. Knowing what topic a page relates to is invaluable for returning relevant results.
    It can offer related topics and keywords to help you narrow your search successfully. With a keyword like sport the engine would offer you a list of sports perhaps as well as sports related news and blogs.
    Instead of offering you the related keywords, the engine can directly incorporate them back into the search with less weight than the user inputted ones. It’s still contested as to whether this will produce better results or just more varied ones.
    If the engine uses statistical analysis to retrieve it’s semantic matches to a keyword (as Google is likely to do) then its likely that keywords currently associated with hot news topics will bring those in as well. For example, using my engine to search for the keyword police, brought up peerages (relating to the uk’s cash for honors scandal recently).
    So, according to me:

    “A semantic search engine is a search engine that takes the sense of a word as a factor in its ranking algorithm or offers the user a choice as to the sense of a word or phrase.”

    This is not in line with the purists of what is known as ‘The Semantic Web’, who believe that for some reason we should spend all our time tagging documents, pages and images to make them acceptable for a computer to read. Well, I’m sorry but I’m not going to waste my time tagging when a computer is able to derive context and do it for me. I may have offended Tim Berners Lee by saying this, but as the creator of the Web he should know better.

    How does Google match up?
    Until extremely recently, Google’s semantic technology (which they’ve had now for quite a while) was limited to matching those adsense blocks to your website’s content. This is neat, and a good practical example of the technology - but not relevant to their core search product. However if you make a single keyword search today, chances are you may spot a block like this at the bottom of your results page:

    按此在新窗口浏览图片

    This is more or less exactly what I was just writing about. They’re offering you alternatives based upon your initial search, which in this case was obviously for citizen. Citizen is a bank, a watchmaker and (if I’m not mistaken) it means you’re a member of a country or something. This is the first clear example of Google employing a semantic engine that works by analyzing the context of words in their index and returning likely matches for sense.

    Some of you may be wondering why they aren’t doing this for multiple keyword phrases, which I can take a guess at from some of my own work. Analyzing the context of a word statistically is intensive and slow; and if you try and analyze two, you slow the process further and so on. It is likely they have problems doing so for more than one keyword currently, and Google as ever is cautious about changing their interface too radically too quickly. This implementation of semantics gives hope that they haven’t adopted the purist view of ‘The Semantic Web’ where everything is tagged and filed neatly into nice little packages.

    Google is all too aware of the following very large problems with that idea:

    Users are stupid.
    Users are lazy.
    Redefining the way they’ve indexed what is assumed to be petabytes of data would require them to effectively start again.
    It’s not as powerful or dynamic.
    How Google can utilize Semantic technologies
    It’s my belief that Google will increasingly tie this technology into their core search experience as it improves in speed and reliability. It has some phenomenally powerful uses and I’ve taken the liberty of laying out a few of my suggestions on where they can go with this:

    Self aware pages

    Tagging pages with keywords has always been used on the internet to let search engines know what kind content the page contains.
    Using a Google API we can generate the necessary keywords on the fly as the page loads. This cuts out a large amount of work for SEO.
    A Google API enabled engine wouldn’t even need to look at these keywords, it could generate them itself.
    Not only a page can be self aware these days, people tag everything - including links. The Google API could conceivably be used to tag every single word on a page, creating a page that covers every single keyword possibility. This is overkill - but a demonstration of the power available.
    Narrow Search

    When you begin a search, you enter just one or two keywords in the topic you’re interested in.
    Related keywords appear, which you can then select from to target your search and remove any doubts about dual meanings of a word for example.
    This step repeats every time you search, also possible is opinionated search.
    Opinionated Search

    Because of the way Google statistically finds the senses of keywords from the mass of pages in its index, what in fact it finds is the majority opinion from those pages of what the sense of a word is.
    At the base level, you can select from the average opinion of related keywords and subjects from its entire index.
    You can find the opinion at other levels as well though, and this is where the power comes in in terms of really targeting what the user is looking for quickly and efficiently. All the following mean that this is the first true example of social search:
    Find the opinion over a range of dates, good for current events, modern history, changes in trends.
    Find the opinion over areas of geography, or by domain extension (.co.uk, .com).
    Find the opinion over a certain group of websites, or just one website in particular - compare that with another site.
    Find the opinion not only over the above things but also subjects, topics, social and religious groups.
    At the most ridiculous example level, you could even find what topics 18 year olds on myspace living in Leeds most talk about - but that I could probably guess. The point is that this is targeting demographics on a really unprecedented level.
    Add the sites or web pages to your personal profile that you think most closely reflect your opinions, this data can then be taken into account in all future searches returning greater personal relevancy.
    Conclusion
    Google is using semantic technology, but is not yet a fully fledged semantic search engine. It does not use NLP (Natural Language Processing), but this is not a barrier to producing some truly web changing technology with a bit of thought and originality. NLP may well be (I hate myself for writing this) web 4.0 and semantics is web 3.0 - they are in fact different enough to be classified as such in my eyes and the technology [URL=http://www.readwriteweb.com/archives/hakia_takes_on_google_semantic_search.php]Hakia is developing[/URL] is certainly markedly distinct from Google’s semantic efforts.

    There are barriers that Google needs to overcome... is it capable of becoming fully semantic without modifying it’s index too drastically; can Google continue to keep the results simple and navigable for its varied user base? Most importantly, does Google intend to become a fully semantic search engine and to do so within a timescale that won’t damage their position and reputation? I like to think that although the dragon is sleeping, that doesn’t mean it’s not dreaming!


       收藏   分享  
    顶(0)
      




    ----------------------------------------------

    -----------------------------------------------

    第十二章第一节《用ROR创建面向资源的服务》
    第十二章第二节《用Restlet创建面向资源的服务》
    第三章《REST式服务有什么不同》
    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》
    [InfoQ文章]解答有关REST的十点疑惑

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/3/27 9:49:00
     
     scromp 帅哥哟,离线,有人找我吗?魔羯座1985-1-19
      
      
      等级:大一(高数修炼中)
      文章:22
      积分:197
      门派:XML.ORG.CN
      注册:2007/3/20

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给scromp发送一个短消息 把scromp加入好友 查看scromp的个人资料 搜索scromp在『 最新动态 & 业界新闻 』的所有贴子 引用回复这个贴子 回复这个贴子 查看scromp的博客2
    发贴心情 
    老大,有没有没中文的!
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/3/27 16:30:00
     
     pig-can 帅哥哟,离线,有人找我吗?
      
      
      等级:大三暑假(编写VC程序赚了5000元)
      文章:70
      积分:711
      门派:W3CHINA.ORG
      注册:2007/3/14

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给pig-can发送一个短消息 把pig-can加入好友 查看pig-can的个人资料 搜索pig-can在『 最新动态 & 业界新闻 』的所有贴子 点击这里发送电邮给pig-can 引用回复这个贴子 回复这个贴子 查看pig-can的博客3
    发贴心情 
    呵呵,期待 WEB4.0 出来哈~~

    ----------------------------------------------
    对世界进行 哲学而系统地思考  采用合适的建模手段  设计优秀的算法与系统架构 以和谐的团队去进攻  并以感恩地心,享受生活赐予我们的一切。

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/3/28 13:18:00
     
     qxr777 帅哥哟,离线,有人找我吗?
      
      
      等级:大二(研究汇编)
      文章:28
      积分:266
      门派:XML.ORG.CN
      注册:2005/5/12

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给qxr777发送一个短消息 把qxr777加入好友 查看qxr777的个人资料 搜索qxr777在『 最新动态 & 业界新闻 』的所有贴子 引用回复这个贴子 回复这个贴子 查看qxr777的博客4
    发贴心情 
    作为搜索引擎的绝对老大,google一定会成为实践semantic search engine的先驱。
    但目前看来,google正在酝酿(我们的良好愿望,呵呵)
    web内容的语义标注、对于用户查询的逻辑推理都是复杂且耗时的,这些会对google目前基于关键词的传统搜索引擎工作方式产生较大的影响
    为了显著提供“查准率”,semantic search engine才是真正的王道
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/3/28 13:32:00
     
     pig-can 帅哥哟,离线,有人找我吗?
      
      
      等级:大三暑假(编写VC程序赚了5000元)
      文章:70
      积分:711
      门派:W3CHINA.ORG
      注册:2007/3/14

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给pig-can发送一个短消息 把pig-can加入好友 查看pig-can的个人资料 搜索pig-can在『 最新动态 & 业界新闻 』的所有贴子 点击这里发送电邮给pig-can 引用回复这个贴子 回复这个贴子 查看pig-can的博客5
    发贴心情 
    窃以为,文章中讲到,从文本中建立semantic模型本身才是复杂性最大的地方:指望全部人工标注不可能,指望 NLU 也不知道要多少年,所以目前最切合实际的方法还是通过对文本本身的统计处理,NLP(简单地抽取特征向量特征矩阵也应该算NLP?) 以及用户行为的分析来尽量地自动地对海量数据抽取和 “学习” 出一些 semantic 来~,估计这个才是semantic search engine 的技术的核心所在~  

    这不是俺的方向,胡说几句,lz快来斧正!再推荐几篇经典paper~~ ^_^

    以下是引用qxr777在2007-3-28 13:32:00的发言:
    作为搜索引擎的绝对老大,google一定会成为实践semantic search engine的先驱。
    但目前看来,google正在酝酿(我们的良好愿望,呵呵)
    web内容的语义标注、对于用户查询的逻辑推理都是复杂且耗时的,这些会对google目前基于关键词的传统搜索引擎工作方式产生较大的影响
    为了显著提供“查准率”,semantic search engine才是真正的王道

    ----------------------------------------------
    对世界进行 哲学而系统地思考  采用合适的建模手段  设计优秀的算法与系统架构 以和谐的团队去进攻  并以感恩地心,享受生活赐予我们的一切。

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/3/28 15:35:00
     
     superc_7 帅哥哟,离线,有人找我吗?射手座1983-12-15
      
      
      威望:6
      等级:研一(彻夜钻研J2EE)
      文章:504
      积分:3396
      门派:XML.ORG.CN
      注册:2005/4/22

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给superc_7发送一个短消息 把superc_7加入好友 查看superc_7的个人资料 搜索superc_7在『 最新动态 & 业界新闻 』的所有贴子 点击这里发送电邮给superc_7  引用回复这个贴子 回复这个贴子 查看superc_7的博客6
    发贴心情 
    还是没看懂他所说的semantic search engine是什么……
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/3/28 21:03:00
     
     qxr777 帅哥哟,离线,有人找我吗?
      
      
      等级:大二(研究汇编)
      文章:28
      积分:266
      门派:XML.ORG.CN
      注册:2005/5/12

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给qxr777发送一个短消息 把qxr777加入好友 查看qxr777的个人资料 搜索qxr777在『 最新动态 & 业界新闻 』的所有贴子 引用回复这个贴子 回复这个贴子 查看qxr777的博客7
    发贴心情 
    斑竹推荐的《Hakia Takes On Google With Semantic Technologies》就是很好的semantic search engine。
    能够实现不需人工干预的全自动标注,当然最理想的状况,但目前似乎无法实现
    另外,对用户所提交的查询句子进行NLP,然后进行必要的逻辑推理,以推理结论为基础,明确用户真正想要的,应该是完全可能做到的。这实际上可以排除掉搜索结果中很多“噪音”,提高查准率
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/3/28 21:11:00
     
     zhaonix 帅哥哟,离线,有人找我吗?
      
      
      威望:2
      头衔:博士
      等级:研一(日夜苦读RDF Semantics)
      文章:242
      积分:3185
      门派:W3CHINA.ORG
      注册:2005/4/18

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给zhaonix发送一个短消息 把zhaonix加入好友 查看zhaonix的个人资料 搜索zhaonix在『 最新动态 & 业界新闻 』的所有贴子 点击这里发送电邮给zhaonix 引用回复这个贴子 回复这个贴子 查看zhaonix的博客8
    发贴心情 
    看完文章,我发现答案应该是“不是”,虽然作者认为是。
    我说“不是”是因为这篇说的semantic 跟我们说的以ontology为核心的“Semantic Web”中的semantic不是一回事。正如原文后面9#跟贴说的:
        "The only thing Google is doing at the moment is using statistical information to help people specify or narrow down their search queries. In my opinion, this has little to do with semantics."

    事实上,基于俺不是很深的了解,俺对现在学术界热衷的以逻辑推理为核心的Semantic Web的应用前景挺悲观的,等过两天再另发新帖讨论。

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/4/18 23:33:00
     
     admin 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      头衔:W3China站长
      等级:计算机硕士学位(管理员)
      文章:5255
      积分:18406
      门派:W3CHINA.ORG
      注册:2003/10/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 最新动态 & 业界新闻 』的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客9
    发贴心情 
    以下是引用zhaonix在2007-4-18 23:33:00的发言:
    看完文章,我发现答案应该是“不是”,虽然作者认为是。
    我说“不是”是因为这篇说的semantic 跟我们说的以ontology为核心的“Semantic Web”中的semantic不是一回事。正如原文后面9#跟贴说的:
         "The only thing Google is doing at the moment is using statistical information to help people specify or narrow down their search queries. In my opinion, this has little to do with semantics."

    事实上,基于俺不是很深的了解,俺对现在学术界热衷的以逻辑推理为核心的Semantic Web的应用前景挺悲观的,等过两天再另发新帖讨论。


    同意。仅靠逻辑SW是难以步入实际应用的

    ----------------------------------------------

    -----------------------------------------------

    第十二章第一节《用ROR创建面向资源的服务》
    第十二章第二节《用Restlet创建面向资源的服务》
    第三章《REST式服务有什么不同》
    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》
    [InfoQ文章]解答有关REST的十点疑惑

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/4/20 0:30:00
     
     l.hongjun 帅哥哟,离线,有人找我吗?
      
      
      等级:大三(研究MFC有点眉目了!)
      文章:67
      积分:598
      门派:XML.ORG.CN
      注册:2006/11/22

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给l.hongjun发送一个短消息 把l.hongjun加入好友 查看l.hongjun的个人资料 搜索l.hongjun在『 最新动态 & 业界新闻 』的所有贴子 引用回复这个贴子 回复这个贴子 查看l.hongjun的博客10
    发贴心情 
    语义就那么好嘛?
    有效的是最好的,语义有时候实在有点绕!
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/9/2 16:03:00
     
     GoogleAdSense
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 最新动态 & 业界新闻 』的所有贴子 访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2024/4/27 10:36:43

    本主题贴数33,分页: [1] [2] [3] [4]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    113.281ms