|
Our
exclusive team of experts in
Teamwebpower.com are learning and
specializing in the field of Internet Data Mining or
Knowledge Discovery in Databases (KDD) and Research
We at present provide Data Mining and Internet
Research Services to various sectors of Industries
and business peoples such as Importers, Exporters,
Research institutes, Internet Marketing companies,
Technical consultants, Business Directories
developers, Portal developers etc..,
We provide accurate and most useful data ,which will
highly helpful to do the business confidently ,
research work in a more effective and efficient
manner.
We are also more competitive in our pricing
strategies.
Definition and Process of Data
Mining
What is data mining?
The past two decades has seen a dramatic increase in
the amount of information or data being stored in
electronic format. This accumulation of data has taken
place at an explosive rate. It has been estimated that
the amount of information in the world doubles every
20 months and the size and number of databases are
increasing even faster. The increase in use of
electronic data gathering devices such as
point-of-sale or remote sensing devices has
contributed to this explosion of available data.
The Growing Base of Data
Data storage became easier as the availability of
large amounts of computing power at low cost ie the
cost of processing power and storage is falling, made
data cheap. There was also the introduction of new
machine learning methods for knowledge representation
based on logic programming etc. in addition to
traditional statistical analysis of data. The new
methods tend to be computationally intensive hence a
demand for more processing power.
Having
concentrated so much attention on the accumulation of
data the problem was what to do with this valuable
resource? It was recognized that information is at the
heart of business operations and that decision-makers
could make use of the data stored to gain valuable
insight into the business. Database Management systems
gave access to the data stored but this was only a
small part of what could be gained from the data.
Traditional on-line transaction processing systems,
OLTPs, are good at putting data into databases
quickly, safely and efficiently but are not good at
delivering meaningful analysis in return. Analyzing
data can provide further knowledge about a business by
going beyond the data explicitly stored to derive
knowledge about the business. This is where Data
Mining or Knowledge Discovery in Databases (KDD) has
obvious benefits for any enterprise.
The term
data mining has been stretched beyond its limits to
apply to any form of data analysis. Some of the
numerous definitions of Data Mining, or Knowledge
Discovery in Databases are:
Data Mining, or Knowledge Discovery in Databases (KDD)
as it is also known, is the nontrivial extraction of
implicit, previously unknown, and potentially useful
information from data. This encompasses a number of
different technical approaches, such as clustering,
data summarization, learning classification rules,
finding dependency net works, analysing changes, and
detecting anomalies.
William J Frawley, Gregory Piatetsky-Shapiro and
Christopher J Matheus
Data mining is the search for relationships and global
patterns that exist in large databases but are
`hidden' among the vast amount of data, such as a
relationship between patient data and their medical
diagnosis. These relationships represent valuable
knowledge about the database and the objects in the
database and, if the database is a faithful mirror, of
the real world registered by the database.
Marcel
Holshemier & Arno Siebes (1994)
The
analogy with the mining process is described as:
Data mining refers to "using a variety of techniques
to identify nuggets of information or decision-making
knowledge in bodies of data, and extracting these in
such a way that they can be put to use in the areas
such as decision support, prediction, forecasting and
estimation. The data is often voluminous, but as it
stands of low value as no direct use can be made of
it; it is the hidden information in the data that is
useful"
Clementine User Guide, a data mining toolkit
Basically data mining is concerned with the analysis
of data and the use of software techniques for finding
patterns and regularities in sets of data. It is the
computer which is responsible for finding the patterns
by identifying the underlying rules and features in
the data. The idea is that it is possible to strike
gold in unexpected places as the data mining software
extracts patterns not previously discernable or so
obvious that no-one has noticed them before.
Data mining analysis tends to work from the data up
and the best techniques are those developed with an
orientation towards large volumes of data, making use
of as much of the collected data as possible to arrive
at reliable conclusions and decisions. The analysis
process starts with a set of data, uses a methodology
to develop an optimal representation of the structure
of the data during which time knowledge is acquired.
Once knowledge has been acquired this can be extended
to larger sets of data working on the assumption that
the larger data set has a structure similar to the
sample data. Again this is analogous to a mining
operation where large amounts of low grade materials
are sifted through in order to find something of
value.
Data Mining Models
IBM have identified two types of model or modes of
operation which may be used to unearth information of
interest to the user.
Verification Model
The verification model takes an hypothesis from the
user and tests the validity of it against the data.
The emphasis is with the user who is responsible for
formulating the hypothesis and issuing the query on
the data to affirm or negate the hypothesis.
In a marketing division for example with a limited
budget for a mailing campaign to launch a new product
it is important to identify the section of the
population most likely to buy the new product. The
user formulates an hypothesis to identify potential
customers and the characteristics they share.
Historical data about customer purchase and
demographic information can then be queried to reveal
comparable purchases and the characteristics shared by
those purchasers which in turn can be used to target a
mailing campaign. The whole operation can be refined
by `drilling down' so that the hypothesis reduces the
`set' returned each time until the required limit is
reached.
The
problem with this model is the fact that no new
information is created in the retrieval process but
rather the queries will always return records to
verify or negate the hypothesis. The search process
here is iterative in that the output is reviewed, a
new set of questions or hypothesis formulated to
refine the search and the whole process repeated. The
user is discovering the facts about the data using a
variety of techniques such as queries,
multidimensional analysis and visualization to guide
the exploration of the data being inspected.
Discovery Model
The discovery model differs in its emphasis in that it
is the system automatically discovering important
information hidden in the data. The data is sifted in
search of frequently occurring patterns, trends and
generalizations about the data without intervention or
guidance from the user. The discovery or data mining
tools aim to reveal a large number of facts about the
data in as short a time as possible.
An example of such a model is a bank database which is
mined to discover the many groups of customers to
target for a mailing campaign. The data is searched
with no hypothesis in mind other than for the system
to group the customers according to the common
characteristics found.
Contact
us 91-9444113140 to know further or mail to
info@teamwebpower.com
|