A variety of software products and services are being
introduced to analyze complex biological and chemical data in an intuitive and
efficient manner.
Introduction
Data mining has been defined as "the nontrivial
extraction of implicit, previously unknown, and potentially useful information
from data"1. In areas other than the life sciences and healthcare, data
mining is a huge industry, with more than a hundred companies providing a vast
array of software products and services to clients that obtain, generate, and
rely on large quantities of data. The industries that rely daily on data mining
for a number of their functions include marketing, manufacturing, database
providers, government, the travel industry, banking and the financial industry,
telecommunications, and engineering, among others. The common theme is that
these industries all have truly massive amounts of information—about their
operations and also about their clients—collected in a variety of ways. In
order to maximize the usefulness of this information, they rely on software
that helps glean specific patterns and trends from the data, in addition to
making predictions and offering simulations of future events.
It should come as no surprise that the biopharmaceutical
industry is increasingly employing a variety of data-mining methodologies to
help it deal with the enormous amounts of biological information of various
forms that the industry collects. Ranging from annotated databases of disease
profiles and molecular pathways to sequences, structure–activity relationships
(SAR), chemical structures of combinatorial libraries of compounds, individual
and population clinical trial results, the biopharmaceutical industry is
inundated with information, and data mining is the centerpiece of advanced
methodologies to help the industry deal with this information overload2 (see
Competitive business intelligence, pp. 5–6).
The technology
Data mining uses so-called machine learning and also
statistical and visualization methodologies to discover and represent knowledge
in a form that is easily understood by humans. The objective is to reduce
complexity and extract, or mine, as much relevant and useful information from a
large data set as possible. It is important not to confuse data mining in the
biophamaceutical industry with bioinformatics, which is more focused typically
on sequence-based extraction of specific patterns or motifs and also on
specific pattern matching (see Bioinformatics, pp. 31–34).
The biopharmaceutical industry is generating more chemical
and biological screening data than it knows what to do with or how best to
handle. As a result, deciding which target and lead compound to develop further
is often a long and arduous task. Any technology that reduces the
"noise" in the system and makes better use of the vast reams of
information collected would represent a significant competitive advantage. One
contributor to this inefficiency is the software that exists for analyzing and
interpreting chemical and biological information, which by all accounts has not
really kept pace with the development of new discovery methodologies. For
example, software currently used by medicinal chemists to analyze screening
results presents data for individual compounds either in the form of tables, or
by showing SAR correlations in tables of structures that are not user-friendly
in terms of helping design compounds for further testing. Enter data mining,
which aims at nothing less than helping to make sense of these complex data
sets in an intuitive and efficient manner.
Current state
Table 1 lists selected companies that offer specific
data-mining products and services tailored to the biopharmaceutical industry.
The specific examples illustrate the breadth of data-mining applications, and
also how they differ from more traditional bioinformatics ones. For example,
Chiron Informatics focuses on healthcare delivery systems and is developing
decision-support, data mining–based products and services to facilitate the
implementation of comprehensive medical management.
Table 1: Selected companies with data-mining products and
services
Full table
Another example, Lexical Technology, founded in 1984,
specializes in the development of lexically based products and services for
healthcare vendors and enterprises. Lexical's Metaphrase software product
family was developed with contributions from collaborators that include the
National Library of Medicine, the National Cancer Institute, Kaiser Permanente,
the Mayo Clinic, the American College of Physicians.
Bioreason is a new company with proprietary chemoinformatic
knowledge discovery and data-mining software to help identify
structure–activity relationships in large quantities of relevant data.
Finally, Columbus Molecular Software develops and markets
its LeadScope software for visualizing, browsing, and interpreting chemical and
biological screening data, thus accelerating the extraction of information that
helps validate targets and leads for further preclinical or clinical
development.
It is interesting to note that although data-mining
companies that specialize in the biopharmaceutical sector are relatively few
(again, excluded here are traditional bioinformatics companies), more than 100
general data-mining companies serve other industries, with very significant
revenues2.
Methodologies and applications
Data-mining applications are being developed using
essentially six major approaches, which lend themselves to different types of
biological data analysis. The first approach is generically known as
influence-based mining. Here, complex and granular (as opposed to linear) data
in large databases are scanned for influences between specific data sets, and
this is done along many dimensions and in multi-table formats. These systems
find applications wherever there are significant cause-and-effect relationships
between data sets—as occurs, for example, in large and multivariant gene
expression studies, which are behind areas such as pharmacogenomics.
A variant of influence-based mining is the method
generically referred to as affinity-based mining. Again, large and complex data
sets are analyzed across multiple dimensions, and the data-mining system
identifies data points or sets that tend to be grouped together. These systems
differentiate themselves by providing hierarchies of associations and showing
any underlying logical conditions or rules that account for the specific
groupings of data. This approach is particularly useful in biological motif
analysis, whereby it is important to distinguish "accidental" or
incidental motifs from ones with biological significance.
Yet another approach is generically referred to as
time-delay data mining. Here, the data set is not available immediately and in
complete form, but is collected over time. The systems designed to handle such
data look for patterns that are confirmed or rejected as the data set increases
and becomes more robust. This approach is geared toward long-term clinical
trial analysis and multicomponent mode of action studies, for example.
In the fourth approach, trends-based data mining, the
software analyzes large and complex data sets in terms of any changes that
occur in specific data sets over time. The data sets can be user-defined, or
the system can uncover them itself. Essentially, the system reports on anything
that is changing over time. This is especially important in cause-and-effect
biological experiments. Screening is a good example, where responses over time
to particular drugs or other stimuli are being collected for analysis. The
software is designed specifically for this purpose, and can identify multiple
trends very efficiently.
The fifth approach is generically known as comparative data
mining, and it focuses on overlaying large and complex data sets that are
similar to each other and comparing them. This is particularly useful in all
forms of clinical trial meta analyses, where data collected at different sites
over different time periods, and perhaps under similar but not always identical
conditions, need to be compared. Here, the emphasis is on finding
dissimilarities, not similarities.
Finally, data mining alone is lacking somewhat if it is
unable to also offer a framework for making simulations, predictions, and
forecasts, based on the data sets it has analyzed. So-called predictive data
mining combines pattern matching, influence relationships, time set
correlations, and dissimilarity analysis to offer simulations of future data
sets. One advantage here is that these systems are capable of incorporating
entire data sets into their working, and not just samples, which make their
accuracy significantly higher. Predictive data mining is used often in clinical
trial analysis and in structure–function correlations.
The future
A key application of data mining is the protein folding
process and the derivation of structure–function relationships. Here,
contributions from related fields, such as machine learning developed in
engineering, are making significant contributions that are producing software
better able to handle this complex task, and we will see a lot more of these
approaches in the future3.
In addition, data mining methodologies will be increasingly
applied to the extraction of information not just from biological data, such as
sequences, but also from the scientific literature itself. With the increase in
electronic publications, there is an opportunity and a need to develop
automated ways of searching and summarizing the literature. A recent report
describes the use of automated keyword extraction to produce up-to-date entries
on human inherited diseases from the OMIM (Online Mendelian Inheritance in Man)
database4.
Another major development for the future is the application
of data mining to clinical information databases, such as heart disease
databases. The methodology here can help reveal patients at higher risk for
heart disease and therefore promise significant preventative potential5.
Finally, data mining methods are used to improve
computer-assisted drug design, by using techniques such as genetic algorithms
and others to detect chemical entity features that occur in clusters within the
high-dimensional analytical data of drug design experiments6. This type of
cluster analysis helps optimize the search for relevant new drug structures and
therefore has major importance for the industry.
Conclusions
The explosive growth of biological data generation and
availability has shifted bottlenecks in drug development from the discovery
phase to the high-throughput analysis phase. Here, humans alone cannot go over
the vast tracts of data sets that are being generated. Data mining is emerging
within the biopharmaceutical industry as a significant ally in this effort, in
a way that compliments and expands traditional bioinformatics. Eventually, data
mining and bioinformatics will be indistinguishable, but for the time being
they are distinct. It is important to remember that this is an example of a
technology that has been successfully deployed in many other industries whose data
requirements are similar to those of the biopharmaceutical industry. Data
mining has met with very significant success in these other industries, and it
is expected that in the next few years it will contribute significantly in the
optimization of the data analysis process of the biopharmaceutical industry as
well.
This Is Video NJIT School of Management professor Stephan P Kudyba describes what data mining is and how it is being used in the business world.
Artificial Intelligence (English: Artificial Intelligence or
AI) is defined as intelligenceexhibited by an artificial entity. Such systems
are generally considered to be a computer.Intelligence was created and put into
a machine (computer) in order to do the job as dohumans. Several kinds of
fields that use artificial intelligence expert systems, among others, computer
games (games), fuzzy logic, neural networks and robotics.
Many things that seem difficult for human intelligence, but
for Informatics relatively no problem. As an example: transforming the
equation, solve the integral equation, making the game of chess or backgammon.
On the other hand, it is for humans seems to requirea bit of intelligence, is
still difficult to realize in Informatics. As an example: Introduction toObject
/ Face, playing football.
Although AI has a strong science fiction connotation, AI
forms a very important branch ofcomputer science, dealing with behavior,
learning and intelligent adaptation in amachine.
Research in AI involves the manufacture of machines to
automate tasks requiringintelligent behavior. Including for example the
control, planning and scheduling, diagnosisand the ability to answer customer
questions, as well as handwriting recognition, voiceand face. Things like that
have become separate disciplines, which focus on providingsolutions to problems
in real life. AI systems now commonly used in economics, medicine, engineering
and military, as it has built in some home computer software applications and
video games.
'Artificial intelligence' is not just to understand what
intelligence systems. There is nosatisfactory definition for 'intelligence':
1. Intelligence: the ability to acquire knowledge and use
it, or
2. Intelligence that is what is measured by an 'Intelligence
Test'
Broadly speaking, the AI is divided into two schools of
thought namely Conventional AIand Computational Intelligence (CI, Computational
Intelligence). Conventional AI mostly involves methods now classified as
machine learning, characterized by formalism andstatistical analysis. Also
known as symbolic AI, logical AI, AI and AI pure old fashioned way (GOFAI, Good
Old Fashioned Artificial Intelligence). Method-the method include:
• Expert systems: the capability to apply judgment to reach
conclusions. An expert system can process large amounts of known information
and provide conclusions basedon such information.
• considerations based on case
• Bayesian Network
• behavior-based AI: a modular method to the formation of AI
systems manually
Computational intelligence involves the development or
interactive learning (such asparameter tuning in connectionist systems).
Learning is based on empirical data and are associated with non-symbolic AI, AI
irregular and soft computing. Basic methods include:
• Neural networks: systems with pattern recognition
capabilities are very strong
• Fuzzy systems: techniques for consideration under
uncertainty, has been used extensively in modern industrial and consumer
product control systems.
• Evolutionary Computation: applying concepts such as
biologically inspired population,mutation and the "survival of the
fittest" to produce a better solution.
These methods are mainly divided into evolutionary
algorithms (eg genetic algorithms)and swarm intelligence (eg ant algorithms).
With hybrid intelligent systems, experimentsdesigned to combine these two
groups. Expert inference rules can be generated througha neural network or
production rules from statistical learning such as the ACT-R. Apromising new
approach is mentioned that the strengthening of intelligence to try toachieve
artificial intelligence in the process of evolutionary development as a side
effectof the strengthening of human intelligence through technology.
HISTORY
In the early 17th century, René Descartes proposed that
bodies of animals are nothingbut just complicated machines. Blaise Pascal
invents the first mechanical digitalcalculating machine in 1642. At 19, Charles
Babbage and Ada Lovelace worked onmechanical calculating machines can be
programmed.
Bertrand Russell and Alfred North Whitehead published
Principia Mathematica, which revolutionized formal logic. Warren McCulloch and
Walter Pitts published "Logical Calculus of Ideas that remain in
activity" in 1943 which laid the foundation for neural networks.
The 1950s were a period of active efforts in AI. The first
working AI programs were written in 1951 to run the Ferranti Mark I machine at
the University of Manchester (UK): agame program script written by Christopher
Strachey and a chess program written byDietrich Prinz. John McCarthy coined the
term "artificial intelligence" at the firstconference devoted to this
subject, in 1956. He also invented the Lisp programminglanguage. Alan Turing
introduced the "Turing test" as a way to operationalize a test of
intelligent behavior. Joseph Weizenbaum built ELIZA, a chatterbot
implementingRogerian psychotherapy. During the 1960s and the 1970s, Joel Moses
demonstrated the power of symbolic reasoning for integration problems in the
Macsyma program, a successful knowledge-based program first in the field of
mathematics. Marvin Minsky and Seymour Papertpublish Perceptrons, demonstrating
limits of simple neural networks and AlainColmerauer developed the computer
language Prolog. Ted Shortliffe demonstrated the power of rule-based system for
knowledge representation and inference in medical diagnosis and therapy are
sometimes referred to as the first expert system. HansMoravec developed the
first computer controlled vehicle to address the tangled streetsberintang independently.
In the 1980s, is widely used neural network with back
propagation algorithm, was firstdescribed by Paul John Werbos in 1974. In the
1990s marked a big acquisition invarious fields of AI and demonstrations of
various applications. More specifically DeepBlue, a chess computer, beat Garry
Kasparov in a game 6 of the famous game in 1997.DARPA stated that the costs
saved through the application of AI methods for schedulingunit in the first
Gulf War had to replace the entire investment in AI research since 1950 onthe
U.S. government.
Great DARPA challenge, which began in 2004 and continues to
this day, is a race for $ 2million prize where the vehicle driven without
communication with humans, using GPS,computers and sophisticated sensor array,
several hundred miles across the desert areais challenging.
FILISOFI
The debate about the strong AI with weak AI is still a hot
topic amongst AI philosophers.This involves thinking and philosophy of
mind-body problem. Roger Penrose in his bookThe Emperor's New Mind and John
Searle with a thought experiment "space China"argued that the
awareness of the true can not be achieved by a system of formal logic,while
Douglas Hofstadter in Gödel, Escher, Bach and Daniel Dennett in Consciousness
Explained to show support for the functionalist. In the opinion of many
supporters of AIare powerful, consciousness-made is considered as uric sacred
(holy grail) artificial intelligence.
SCIENCE FICTION
In science fiction, AI is generally described as a future
force that will try to overthrowhuman authority as in HAL 9000, Skynet,
Colossus and The Matrix, or a humanresemblance to deliver services such as
C-3PO, Data, the Bicentennial Man, theMechas in AI or Sonny in I Robot. The
nature of AI world domination is inevitable,sometimes called "the
Singularity", is also contradicted by several science writers likeIsaac
Asimov, Vernor Vinge and Kevin Warwick. In a work like the manga Ghost in
theShell was the Japanese, questioned the existence of intelligent machines as
the definition of living organism is more than just a category of a broader
independent entity, establish the concept ofsystemic intelligence that has an
idea. See a list of fictional computers (list of fictionalcomputers) and a list
of fictional robots and android (list of fictional robots and androids).
Television series BBC Blake's 7 highlight a number of
computer savvy, including Zen(Blake's 7), the computer controls the aircraft
stars Liberator (Blake's 7); ORAC,supercomputing up a high level in the box
perspex portable with the ability to think andeven predict the future; and
Slave, computers on the plane Scorpio star.