新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> Biomatics, Gene Ontology(基因本体)
    [返回] 中文XML论坛 - 专业的XML技术讨论区计算机技术与应用『 生物信息学 』 → bioinformatics(4) 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 12239 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: bioinformatics(4) 举报  打印  推荐  IE收藏夹 
       本主题类别:     
     admin 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      头衔:W3China站长
      等级:计算机硕士学位(管理员)
      文章:5255
      积分:18406
      门派:W3CHINA.ORG
      注册:2003/10/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 生物信息学 』 的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客楼主
    发贴心情 bioinformatics(4)


    bioinformatics(4)                     


    发信人: happymood (土豆块儿), 信区: Bioinformatics
    标  题: bioinformatics(4)
    发信站: 北大未名站 (2001年04月10日16:15:35 星期二), 站内信件

    发信人: spaceman (estranged), 信区: LifeScience
    标  题: Bioinformatics, Genomics, and Proteomics(转载)
    发信站: BBS 水木清华站 (Wed Nov 29 10:34:37 2000)

    The Scientist 14[23]:26, Nov. 27, 2000
    PROFILE
    Bioinformatics, Genomics, and Proteomics
    Scientific discovery advances as technology paves the path
    By Christopher M. Smith
    Data Mining Software for Genomics, Proteomics and Expression Data (Part 1)
    Data Mining Software for Genomics, Proteomics and Expression Data (Part 2)
    High-throughput (HT) sequencing, microarray screening and protein expression
    profiling technologies drive discovery efforts in today's genomics and prot
    eomics laboratories. These tools allow researchers to generate massive amoun
    ts of data, at a rate orders of magnitude greater than scientists ever antic
    ipated. Initiatives to sequence entire genomes have resulted in single data
    sets ranging in size from 1.8 million nucleotides (Haemophilus influenza gen
    ome) to more than 3 billion (human genome)--a single microarray assay can ea
    sily produce information on thousands of genes, and a temporal protein expre
    ssion profile may capture a data picture of 6,000 proteins.1
    Integration of Genomica's LinkMapper with ABI's Gene Mapper
    ----------------------------------------------------------------------------
    ----
    It's what you do with the data that counts, however, and that's where bioinf
    ormatics takes over. Researchers in bioinformatics are dedicated to the deve
    lopment of applications that can store, compare, and analyze the voluminous
    quantities of data generated by the use of new technologies.
    One of the original functions of bioinformatics was to provide a mechanism t
    o compare a query DNA or protein sequence against all sequences in a databas
    e. Several comparison algorithms have provided some successful and powerful
    computational applications,2 such as Smith-Waterman, FASTA, and BLAST. Early
    on, query sequences or sets of query sequences were relatively small, rangi
    ng from a few to 10,000 nucleotides, and 10- to 1,000-sequence query sets. B
    ecause of the proliferation and improvement of HT sequencing technologies, i
    t is now common to find query sequences with 10,000 nucleotides and data set
    s containing up to 1 million sequences.
    The kinds of data developed and the methods for processing and analysis also
    have changed. Previously, small-scale DNA sequencing projects would perhaps
    generate 100 sequences (usually 50-400 nucleotides) that could be assembled
    relatively easily into a contiguous DNA sequence (a contig). Today, contig
    assembly may involve 1 million sequences with up to 5,000 nucleotides. The b
    urgeoning fields of proteomics and microarray technologies provide another d
    egree of complexity, adding multidimensional information to the biological d
    ata cornucopia.
    New Scientific Challenges
    The exponential rate of discovery in the era of modern molecular biology has
    been nothing short of phenomenal, culminating with the announcement in June
    2000 that preliminary sequencing of the human genome had been completed.3 H
    owever, this achievement is just a taste of the scientific successes that ar
    e to come in the 21st century. As impressive as it is, the determination of
    the sequence of the approximately 3.2 billion nucleotides of the human genom
    e, encoding an estimated 100,000 proteins, represents only the first step do
    wn a long road. Gene identification does not automatically translate into an
    understanding of gene function. Although mapping and cloning studies have l
    inked a number of genes to heritable genetic diseases, the true (i.e., "norm
    al") function of a majority of these genes remains unknown.
    This dichotomy between gene identity and function will be one source of new
    research challenges in the 21st century, encompassing problems in biological
    science, computational biology, and computer science. Biologists will need
    to decipher the genetic makeup of genomes, map genotypes with phenotypic tra
    its, determine gene and protein structure and function, design and develop t
    herapeutic agents (recombinant and genetically engineered proteins, and smal
    l molecule ligands), and unravel biochemical pathways and cellular physiolog
    y. Tackling these biological issues will require innovations in computationa
    l biology that will be met by the development of new algorithms and methods
    for comparison of DNA and protein sequence, design of novel metrics for simi
    larity and homology analyses, tools to outline biochemical pathways and inte
    ractions, and construction of physiological models. Success in the computati
    onal biology arena will require improvements in computational and informatic
    s infrastructure, including development of novel databases as well as annota
    tion, curation, and dissemination tools for the databases; design of paralle
    l computation methods; and development of supercomputers. These latter chall
    enges are particularly important, as high performance computing (HPC) and bi
    oinformatics applications need to be retooled to accommodate the fast interr
    ogation of a plethora of databases, comparisons between relatively long stri
    ngs of data, and data with varying degrees of complexity and annotation.
    The lion's share of interest and effort over the past few years has been dir
    ected toward protein identification (proteomics), structure-function charact
    erization (structural bioinformatics), and bioinformatics database mining. T
    he pharmaceutical industry has for the most part driven these efforts in the
    search for new therapeutic agents. Identifying proteins from the cellular p
    ool and/or determining structure-function in the absence of concrete biologi
    cal data is a daunting task, but novel technological approaches are helping
    scientists to make headway on these fronts.
    Proteomics: Protein Expression Profiling
    Proteomics refers to the science and the process of analyzing and cataloging
    all the proteins encoded by a genome (a proteome). Since the majority of al
    l known and predicted proteins have no known cellular function, the hope is
    that proteomics will bridge the chasm separating what raw DNA and protein pr
    imary sequence reveals about a protein and its cellular function. Determinin
    g protein function on a genomewide scale can provide critical pieces to the
    metabolic puzzle of cells. Because proteins are involved in one measure or a
    nother in disease states (whether induced by bacterial or viral infection, s
    tress, or genetic anomaly), complete descriptions of proteins, including seq
    uence structure and function, will substantially aid the current pharmaceuti
    cal approach to therapeutics development. This process, known as rational dr
    ug design, involves the use of specific structural and functional aspects of
    a protein to design better proteins or small molecule ligands that can serv
    e as activators or inhibitors of protein function. A recent technology profi
    le in LabConsumer4 and a meeting review5 detail companies providing proteomi
    cs tools.
    The multidimensional nature of proteomics data (for example, 2D-PAGE gel ima
    ges) presents novel collection, normalization, and analysis challenges. Data
    collection issues are being overcome by sophisticated proteomic systems tha
    t semiautomate and integrate the experimental process with data collection.
    Improvements in the experimental technology have increased the number of pro
    teins that can be identified, with consistency, within a single gel; however
    , making comparisons and looking for patterns and relationships between prot
    eins and/or particular environmental, disease, or developmental states requi
    res data mining and knowledge discovery tools.
    Finding the Needle in the Haystack
    Data mining refers to a new genre of bioinformatics tools used to sift throu
    gh the mass of raw data, finding and extracting relevant information and dev
    eloping relationships between them.6 As advances in instrumentation and expe
    rimental techniques have led to the accumulation of massive amounts of data,
    data mining applications are providing the tools to harvest the fruit of th
    ese labors. Maximally useful data mining applications should:
    * process data from disparate experimental techniques and technologies and d
    ata that has both temporal (time studies) and spatial (organism, organ, cell
    type, sub-cellular location) dimensions;
    * be capable of identifying and interpreting outlying data;
    * use data analysis in an iterative process, applying gained knowledge to co
    nstantly examine and reexamine data; and
    * use novel comparison techniques that extend beyond the standard Bayesian (
    similarity search) methods.
    Data mining applications are built on complex algorithms that derive explana
    tory and predictive models from large sets of complex data by identifying pa
    tterns in data and developing probable relationships. Data mining workbenche
    s also incorporate mechanisms to filter, standardize/normalize, cluster data
    , and visualize results.
    As a tool to identify open reading frames (ORFs) or hypothetical genes in ge
    nomic data, data mining is a new twist on existing gene discovery applicatio
    ns, such as programs that identify intron/exon boundaries in genomic DNA. On
    e of data mining's greatest practical applications will be in the area of HT
    , microarray-based gene- and protein-expression profiling, where massive dat
    a sets need to be examined to identify sometimes subtle intrinsic patterns a
    nd relationships. Differential gene analysis has the potential to explicitly
    describe the interrelationships of genes during development, under physiolo
    gical stress, and during pathogenesis. The data mining approach taken to ana
    lyze microarray data is a function of experimental design and purpose. Inves
    tigations analyzing defined perturbations of a given genetic stasis use hypo
    thesis-testing computational methods, whereas genetic surveys and research i
    nto fundamental cellular biology use statistical methods. Similarly, the sam
    e methods are utilized in analyzing large-scale proteomics data sets.
    An extension of data mining is the concept of knowledge discovery (KD), in w
    hich the results of data mining experiments open up new avenues of research,
    7 with obvious and subtle findings forming the basis of new questions from d
    ifferent perspectives. Some of the more prominent data mining applications a
    nd KD workbenches are described in the accompanying table.
    Predicting Protein Structure and Function
    Structural bioinformatics involves the process of determining a protein's th
    ree-dimensional structure using comparative primary sequence alignment, seco
    ndary and tertiary structure prediction methods, homology modeling, and crys
    tallographic diffraction pattern analyses. Currently, there is no reliable d
    e novo predictive method for protein 3D-structure determination. Over the pa
    st half-century, protein structure has been determined by purifying a protei
    n, crystallizing it, then bombarding it with X-rays. The X-ray diffraction p
    attern from the bombardment is recorded electronically and analyzed using so
    ftware that creates a rough draft of the 3D structure. Biological scientists
    and crystallographers then tweak and manipulate the rough draft considerabl
    y. The resulting spatial coordinate file can be examined using modeling-stru
    cture software to study the gross and subtle features of the protein's struc
    ture.
    One major bottleneck associated with this classic crystallography technology
    is the inordinate amount of time it takes to successfully grow protein crys
    tals. This problem is being addressed by HT technology under development tha
    t streamlines the crystallization process. This HT crystallography technolog
    y performs many crystallization conditions in parallel with real-time photo-
    video crystal monitoring. This enables the researcher to test thousands of c
    rystallization conditions simultaneously, aborting those conditions that do
    not work at an early stage and selecting "perfect" crystals suitable for X-r
    ay analysis.
    Efforts to bypass the excessive time needed to tweak the rough draft of X-ra
    y crystallographic structures have led to the advancement of computational m
    odeling (homology and ab initio modeling) approaches. These techniques have
    been under development, in one form or another, since the first protein stru
    cture (of myoglobin) was determined in the late 1950s.8 Computational modeli
    ng utilizes predictive and comparative methods to fashion a new protein stru
    cture. Ab initio methods use the physiochemical properties of the amino acid
    sequence of a protein to literally calculate a 3D structure (lowest energy
    model) based on protein folding. As opposed to determining the structure of
    an entire protein, ab initio methods are typically used to predict and model
    protein folds (domains). This method is gaining considerably, in part due t
    o the development of novel mathematical approaches, a boost in available com
    putational resources (for example, tera- and pentaFLOPS supercomputers), and
    considerable interest from researchers investigating protein-ligand (or dru
    g) interactions. Having the structure, even if only hypothetical, for a part
    of the protein that interacts with a ligand, can potentially hasten drug ex
    ploration research.
    In homology modeling, the structural and functional characteristics of known
    proteins are used as a template to create a hypothesized structure for an "
    unknown" protein with similar functional and structural features. Protein st
    ructure researchers estimate that 10,000 protein structures will provide eno
    ugh data to define most, if not all, of the approximately 1,000 to 5,000 dif
    ferent folds that a protein can assume;9 hence, predictive structure modelin
    g will become more accurate and important as more and more structures are de
    rived. The homology modeling approach has become very important to the pharm
    aceutical industry, where expense and time are major drawbacks to the classi
    cal methods of determining protein structure, even if automation shortens th
    e discovery cycle. Hypothesized models provide an electronic footprint with
    which researchers may computationally design various "shoes," such as inhibi
    tors, activators, and ligands.10 This provides for better engineering of pot
    ential drugs and reduces the number of compounds that need to be tested in v
    itro and in vivo.
    A variety of companies and research initiatives have undertaken these modern
    approaches to 3D protein structure determination. Most produce structure pr
    ediction/modeling applications useful in drug development and basic science
    research, provide access to proprietary structure databases, and/or will dev
    elop customized analysis services for researchers. LabConsumer will present
    a profile on molecular modeling applications, including those that are key p
    layers in homology modeling, early next year.
    Tools for the 21st Century
    Modern experimental technologies are providing seemingly endless opportuniti
    es to generate massive amounts of sequence, expression, and functional data.
    The drive to capitalize on this enormous pool of information in order to un
    derstand fundamental biological phenomena and develop novel therapeutics is
    pushing the development of new computational tools to capture, organize, cat
    egorize, analyze, mine, retrieve, and share data and results. Most current c
    omputational applications will suffice for analyses of specific questions us
    ing relatively small data sets. But to expand scientific horizons, to accomm
    odate the larger and larger data sets, and to find patterns and see relation
    ships that span temporal and spatial scales, new tools that broaden the scop
    e and complexity of the analyses are needed. Many of these data mining tools
    are available from the companies highlighted in the accompanying table. The
    se new products and those listed in a previous LabConsumer profile11 have th
    e capacity to expand research opportunities immeasurably.
    Christopher M. Smith (csmith@sdsc.edu) is a freelance science writer in San
    Diego.
    References
    1. W.P. Blackstock, M.P. Weir, "Proteomics: quantitative and physical mappin
    g of cellular proteins," Trends in Biotechnology, 17:121-7, 1999.
    2. R.F. Doolittle, "Computer methods for macromolecular sequence analysis,"
    Methods in Enzymology, Vol. 206. San Diego, Academic Press, 1996.
    3. A. Emmett, "The Human Genome," The Scientist, 14[15]:1, July 24, 2000.
    4. L. De Francesco, "One step beyond: Going beyond genomics with proteomics
    and two-dimensional technology," The Scientist, 13[1]:16, January 4, 1999.
    5. S. Borman, "Proteomics: Taking over where genomics leaves off," Chemical
    & Engineering News, 78[31]:31-7, July 31, 2000.
    6. J.L. Houle et al., "Database mining in the human genome initiative," www.
    biodatabases.com/whitepaper.html, Amita Corp., 2000.
    7. G. Zweiger, "Knowledge discovery in gene-expression-microarray data: mini
    ng information output of the genome," Trends in Biotechnology, 17:429-36, 19
    99.
    8. J.C. Kendrew et al., "Structure of myoglobin," Nature, 185:422-7, 1960.
    9. L. Holm, C. Sander, "Mapping the protein universe," Science, 273:595-602,
    1996.
    10. J. Skolnick, J.S. Fetrow. "From genes to protein structure and function:
    Novel applications of computational approaches in the genomics era," Trends
    in Biotechnology, 18:34-9, 2000.
    11. C. Smith, "Computational gold: Data mining and bioinformatics software f
    or the next millennium," The Scientist, 13[9]:21-3, April 26, 1999.
    12. R.H. Gross, "CMS molecular biology resource," Biotech Software & Interne
    t Journal, 1:5-9, 2000.
    Bioinformatics on the Web
    Portals to data analysis
    The heart of bioinformatics analyses is the software and the databases upon
    which many of the analyses are based. Traditionally, bioinformatics software
    has required high-end workstations (desktop to mid-range servers) with a mu
    ltitude of visualization plug-ins and/or peripheral equipment, and a user (o
    r administrator) willing to routinely download database updates. The mid-ran
    ge UNIX server is still the standard bioinformatics platform, though there a
    re also a fair number of Microsoft Windows? and Apple PowerMac? computers. T
    here are also a number of specialized platforms that integrate hardware and
    custom software into a powerful data analysis tool, such as DeCypher?, produ
    ced by Incline Village, Nev.'s TimeLogic (http://www.timelogic.com/); Biocce
    lerator?, from Compugen Ltd. of Tel Aviv, Israel (http://www.cgen.com/); and
    GeneMatcher?, manufactured by Paracel Inc. (http://www.paracel.com/) of Pas
    adena, Calif. Yet the amount of time, money, and effort needed to purchase a
    nd maintain the hardware, software, and databases required for bioinformatic
    s research can be a considerable burden to a research laboratory.
    2D-gel analysis with Compugen's Z3OnWeb.com
    ----------------------------------------------------------------------------
    ----
    To circumvent many of these problems, a few commercial entities are now prov
    iding fee-based bioinformatics analysis services through the World Wide Web.
    These services offer several advantages over local stand-alone or server-ba
    sed analyses. Because they are provided through a Web interface, these servi
    ces are platform-independent and may be accessed by practically any Web brow
    ser. Also, they are world accessible. No longer must researchers struggle wi
    th different applications (doing the same function), different computer syst
    ems, file formats, and other hurdles to access their data and results. Bioin
    formatics Web portals truly provide universal access. Some of the more recen
    t application service providers of Web-based bioinformatics tools are presen
    ted below.
    Bionavigator (http://www.bionavigator.com/), is a product of eBioinformatics
    Inc., of Sunnyvale, Calif., a spin-off venture of the Australian National G
    enomic Information Service. This service primarily targets academic research
    ers and provides access to more than 20 databases and 200 analytical tools,
    including those for database searching, DNA/protein sequence analysis, phylo
    genetic analyses, and molecular modeling. Another attractive and useful feat
    ure of the Bionavigator is that it can generate publication-quality result o
    utput (for example, color-coded multiple sequence alignments and graphic phy
    logenetic trees).
    Doubletwist.com, formerly Pangea Systems of Oakland, Calif., is a major purv
    eyor of annotated sequence data through its Prophecy database. DoubleTwist h
    as recently added fee-based bioinformatics services through an integrated li
    fe science portal. Using any one of a number of "research agents," researche
    rs can analyze protein and DNA sequence data. DNA analysis tools provide for
    the identification of new gene family members, potential full-length cDNAs,
    and sequence homologs, whereas the protein tools include routines to identi
    fy protein family associations, protein-protein interactions, and conserved
    protein domains.
    GeneSolutions.com, a product of HySeq Inc., of Sunnyvale, Calif., provides a
    ccess to information describing proprietary gene sequences and related data
    from more than 1.4 million expressed sequence tags (EST) analyzed by HySeq u
    sing its proprietary SBH process. The GeneSolutions? Portfolio contains gene
    sequences, homology data, and gene expression data generated by HySeq. More
    than 35,000 genes are reported to have been identified and characterized in
    HySeq's proprietary databases.
    IncyteGenomics OnLine Research (www.incyte.com/online) provides a Web portal
    to the numerous databases developed and maintained by Incyte Genomics Inc.,
    of Palo Alto, Calif., and a personal workbench where researchers can store
    their sequences, perform analyses, and search the company's databases.
    LabOnWeb.com (http://www.labonweb.com/), developed by Compugen Ltd., is an I
    nternet life science research engine providing access to a variety of gene d
    iscovery tools. First introduced in December 1999, the latest version (2.0),
    released in September 2000, includes a variety of tools for the prediction
    of open reading frames and polypeptides (including an InstantRACE module tha
    t uses public and proprietary databases to return a complete cDNA sequence g
    iven an input EST), alternative splicing sites, gene function (by similarity
    to protein domain profiles), and tissue distribution, among others.
    Z3OnWeb.com (http://www.2dgels.com/) is another service provided by Compugen
    for the analysis of 2D-gel image data using Z3 software. Researchers have t
    he option of purchasing and operating the software from their own workstatio
    ns or they may upload their image data to the Web-accessible Z3 platform for
    analysis.
    For researchers working on a nonexistent bioinformatics budget, there are st
    ill a host of powerful bioinformatics applications, accessible without charg
    e, on the Web. If the researcher needs only to perform one or two types of a
    nalyses, and if data security, having to work through several disparate appl
    ications, and output format are not critical issues then these gratis Web to
    ols are a bargain.
    A comprehensive listing of more than 2,300 Web-based bioinformatics tools (a
    nd information sources), organized according to the type of analyses they pe
    rform, is available through the CMS Molecular Biology Resource12 (www.sdsc.e
    du/restools) at the San Diego Supercomputer Center, University of California
    . A good place to start is at the National Institute of Health's National Ce
    nter for Biotechnology Information Web site (http://www.ncbi.nlm.nih.gov/).
    This server contains sequencing and mapping data for nearly 800 different or
    ganisms through the GenBank database, all searchable using the BLAST tool. N
    CBI also contains an ORF finder, the Online Mendelian Inheritance in Man (OM
    IM) database of human genes, and a variety of other useful tools, most of th
    em cross-indexed to the NCBI PubMed MEDLINE database.
    --Christopher M. Smith
    ----------------------------------------------------------------------------
    ----
    The Scientist 14[23]:26, Nov. 27, 2000

    --
    ※ 来源:·北大未名站 bbs.pku.edu.cn·[FROM: 166.111.185.231]


       收藏   分享  
    顶(0)
      




    ----------------------------------------------

    -----------------------------------------------

    第十二章第一节《用ROR创建面向资源的服务》
    第十二章第二节《用Restlet创建面向资源的服务》
    第三章《REST式服务有什么不同》
    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》
    [InfoQ文章]解答有关REST的十点疑惑

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2004/9/23 2:05:00
     
     GoogleAdSense
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 生物信息学 』 的所有贴子 点击这里发送电邮给Google AdSense  访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2024/4/28 23:00:19

    本主题贴数1,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    2,429.688ms