Transcription

Designing and Mining a Blood-BankManagement Database SystemA Thesis Presentedto the University of theAhsanullah University Of Science And TechnologyIn Partial Fullfilmentof the Requirements for the Degree ofBachelor of Science inComputer Science and EngineeringByDeepa DasFareal AhmedSyeda Shabnam HasanJannatul Ferdous: 07.01.04.006: 07.01.04.024: 07.01.04.030: 07.01.04.047March 2011Thesis supervisor : Ms. Rosina Surovi Khan

AcknowledgmentThe enduring pages of the work are the cumulative sequence of extensive guidance andarduous work. We wish to acknowledge and express our gratitude to all those withoutwhom our thesis could not have been a reality. We feel very delighted to get this rareopportunity to show our profound senses of reverence and indebtedness to our thesissupervisor Ms. Rosina Surovi Khan for the information she provided to us through thelecture sittings and her invaluable timely advice and guidance. We would like to extendour sincere thanks to her for giving us her precious time and for being always available tous in order to clarify our doubts regarding the thesis. This thesis is dedicated to ourparents who have given us the opportunities of education from the best institutions andsupport throughout our lives. The last but not the least we would like to thank all thosewho have directly or indirectly helped and cooperated in accomplishing this thesis.i

AbstractA database is the single most useful environment in which to store data and an ideal toolto manage and manipulate that data. The benefits of a well-structured database areinfinite, with increased efficiency and time-saving benefits. Our team’s interest iscentered around this area. At the very start, we create a database on blood-bankmanagement system. We use Microsoft SQL Server for this purpose. We determineattributes and entities and figure out relationships among entities. Then we draw theentity-relationship diagram, convert it to a relational model (relational tables) andnormalize the tables. We implement the design, create tables and insert values inside thetables using sql server. We execute sample queries on the system and verify that oursystem contains all required information making retrieval of the information fast andefficient. In part II of the thesis, we convert the database tables of the system to text files.Using exact and approximate string matching algorithms, we match a string in questionwith the strings in text files and get the index of exactly matched strings for the formerand obtain approximately matched strings displaying edit distances between the two forthe latter.ii

CONTENTSPart-I1Introduction1.11.21.31.42What is a database?Advantages of DatabaseDisadvantages of DatabaseComponents of Database DesignDatabase Design2.12.22.32.42.5The Entity-Relationship ModelRelational Schemas2ormalizationTables with sample values after 2ormalizationQueries11122338111518Part-II3Text File Exportation3.13.2Exporting Text Files From Database TablesElementary program in C to read stringsof text files3.2.13.2.23.33.44ProcedureThe codeConverted Text FilesSelecting a text file for miningExact String Matching Algorithms4.1Research on Exact String Matching Algorithms4.1.14.1.24.1.34.1.44.1.5Brute Force AlgorithmMorris-Pratt AlgorithmKnuth-Morris-Pratt AlgorithmBoyer-Moore AlgorithmAnalysis2121252525293541414145495357iii

5Approximate String matching algorithms5.1Algorithm on finding Edit Distance betweentwo strings5.1.15.1.25.1.35.2Introduction to Approximate String MatchingAlgorithms5.2.15.2.25.2.35.2.46Definition of String Edit DistanceAlgorithmDry RunBrute Force AlgorithmLipschitz Embeddings AlgorithmBall Partitioning AlgorithmAnalysisConclusion and Future work6.16.2ConclusionFuture ix747476767780828.1 Code to measure Edit Distance8.2 Implementation of Exact String Matching Algorithms8.2.18.2.28.2.38.2.4Brute Force AlgorithmBoyer Moore AlgorithmKnuth Morris Pratt AlgorithmMorris Pratt Algorithm8.3 Implementation of Approximate String MatchingAlgorithms8.3.1 Brute Force8.3.2 Lipschitz Embeddings8.3.3 Ball Partitioning Method84848691iv

Part-IChapter 1Introduction1.1 What is a database?A database is a collection of organized interrelated data. Traditionally the data will bepresented something like this:firstname surname DobJohnSmith 01/12/76SaraJones 13/06/69FredBloggs 11/11/73Tables in a database are used for storing specific collections of data.1.2 Advantages of Database It means all of the information is together.The information can be portable if on a laptop.The information is easy to access at any time.It is easily retrievable.Many people can access the same database at the same time.Improved data security.Reduced data redundancy.Reduced updating errors and increased consistency.Greater data integrity and independence from applications programs.Improved data access to users through use of host and query languages.Reduced data entry, storage, and retrieval costs. Facilitated development of new applications program.1

1.3 Disadvantages of Database Database systems are complex, difficult, and time-consuming to design.Initial training required for all programmers and users.Suitable hardware and software start-up costs.A longer running time for individual applications.Damage to database affects virtually all applications programs.Extensive conversion costs in moving from a file-based system to a databasesystem.1.4 Components of Database Design Entity relationship modelRelational Model (Relational tables)Normalization of tablesImplementation in SQL serverUsage of the system (Execution of sample complex queries)2

Chapter 2Database Design2.1 The Entity-Relationship ModelThe entity-relationship (E-R) model was developed to facilitate database design byallowing specification of an enterprise schema that represents the overall logicalstructure of a database. The E-R data model is one of the several semantics datamodels; the semantic aspect of the model lies in its representation of the meaningof the data. The E-R model is very useful in mapping the meanings andinteractions of real-world enterprises onto conceptual schema. The E-R data modelhas three basic notions: entity-sets, relationship sets and attributes.Entity sets: An entity is a thing or object in the real world that is distinguishablefrom all other objects. It has a set of properties, and the values for some set ofproperties may uniquely identify an entity. It is also a set of entities of the sametype that share the same properties.Relationship sets: A relationship is an association among several entities. It is aset of relationships of the same type. The association between entity sets is referredto as participation. The function that an entity plays in a relationship is called thatentity’s role.Attributes: An entity is represented by a set of attributes. Attributes aredescriptive properties possessed by each member of an entity set. The designationof an attribute for an entity set expresses that the database stores similarinformation concerning each entity in the entity set; however, each entity may haveits own value for each attribute. [1]3

Some important features of E-R model:Mapping Cardinality: Mapping cardinalities express the number of entities towhich another entity can be associated via a relationship set. Cardinality can be-- One-to-one: an entity of a set can beassociated with at most one entity ofanother. One-to-many: an entity of a set isassociated with any number (entities)of another set. Many-to-one: an entity (1st set) isassociated with at most one entity(of 2nd set). But 2nd set’s entitycan associate with any number of1st entity set. Many-to-many: Entities of bothsets can be associated with anynumber of entities between them.E-R diagram: It can express the overall logical structure of a database graphically.A diagram consists of some major components—# Rectangles: represent entity set.# Ellipses: represent attributes.# Diamonds: represent relationships.# Lines: which link attributes toentity sets and entity sets torelationship sets. [1]4

hb grpsexhNamedrecog idHospitalhIdDisease Recognizer1mhb qntyrdrecog nameverifybelongs tostatusodis name1sample noBlood SampleDistrictdis idb group1s1nstays inreside inprocessessamplerNameAgedIdb qntyrIdqDonorBlood RecipientSexdreg datedNamesubmit orders toagexrequestsdb grprb grpr regdateptou11registers1rs idrecordsRegistration Staff1Blood Processing Managerbm idsexsex1bm namers nameFigure 2.1.1 : ER diagram of Blood Bank Management System5

Our E-R diagram represents the Blood-Bank Management system. It has eight entity sets.They are—a) Donor: (Attributes- dName, dId, sex, age, dreg date, db grp).b) District: (Attributes- dis id, dis name).c) Registration Staff: (Attributes- rs id, rs name, sex).d) Blood Recipient: (Attributes- rId, sex, age, r regdate, rName, b qnty, rb grp).e) Blood Sample: (Attributes- b group, sample no, status).f) Disease Recognizer: (Attributes- drecog id, drecog name, sex).g) Blood Processing Manager: (Attributes-bm id, bm name, sex).h) Hospital: (Attributes- hId, hName, hb grp, hb qnty).Abbreviations of all attributes are given in relational schema.Some notes about entity sets, their attributes and cardinalities among them---Donor- Who donates blood. When a donor will donate, an id(a serial number willbe given for a specific identification (primary key)); age, sex, name, registrationdate (dreg date) and blood group will be stored in the database under entity Donor.District- Every district’s/location’s id is different (primary key).Registration Staff- Registration staffs will register the information of donors andthe recipients.Disease Recognizer-Disease recognizer will test blood samples whether thesamples are contaminated or okay.Blood Processing Manager- They will take orders from the hospitals and fulfilltheir needed requirements of blood samples.Blood Sample- The quantities of blood that the Blood bank has. Their group,sample no, status will be stored.Hospital- Hospitals of each district, where blood samples are needed, also includedin the database.Blood Recipient- Who needs blood. A recipient’s id, name, age, sex, the bloodsample’s group information will be stored in database.6

Cardinality:District & Donor- (Relationship- (stays in), 1 to many). One donor stays in onedistrict. In one district, many donors can stay.Registration Staff & Donor- (Relationship-(registers), 1 to many). A staff canensure many donors’ registration. One donor can get registered by one staff.Registration Staff & Blood Recipient- (Relationship-(records), 1 to many). Astaff can ensure many blood recipients’ registration. One blood recipient can getregistered by one staff.District & Blood Recipient - (Relationship-(resides in), 1 to many). One recipientstays in one district. In one district, many recipients can stay.District & Hospital- (Relationship-(belongs to), 1 to many).In a district, there aremany hospitals. One hospital belongs to one district.Blood Processing Manager & Hospital- (Relationship-(submit orders to), 1 tomany). A blood processing manager can get orders from many hospitals. Onehospital submits order to a blood processing manager.Blood Processing Manager & Blood Sample-(Relationship-(processes sample),1 to many). A manager can process many samples of blood. One blood sample canbe processed by one blood processing manager.Disease Recognizer & Blood sample- (Relationship-(verify), 1 to many). Adisease recognizer can verify many blood samples. One blood sample is verifiedby one disease recognizer.Blood Processing Manager & Blood Recipient-(Relationship-(request to), 1 tomany). The samples of blood are given according to the necessity of the recipients,processed by the manager. A manager can process many samples of blood that arerequested by the recipients. But one recipient can request only one bloodprocessing manager.7

2.2 Relational SchemasDonorTable 2.2.1Attribute NamedNameDidSexAgedreg daters id (fk)dis id(fk)db grpDescriptionName of the donorId of the donorSex of the donorAge of the donorRegistration date of the donorId of the registration staffDistrict idDonor’s blood groupTypevarcharIntcharIntdateIntIntvarcharThe relationship with Registration staff and Donor is 1 to many. That’s why primary keyof Registration staff is used as a foreign key in Donor.The relationship with District and Donor is 1 to many. That’s why primary key of Districtis used as a foreign key in Donor.DistrictTable 2.2.2Attribute Name DescriptionTypedis idDistrict idIntdis nameName of the district VarcharRegistration StaffTable 2.2.3Attribute Namers idrs nameSexDescriptionId of the registration staffName of the registration staffSex of the registration staffTypeIntvarcharchar8

Blood RecipientTable 2.2.4Attribute NameRidSexAger regdateRnameb qntyrb grprs id (fk)dis id (fk)bm id (fk)DescriptionId of the recipientSex of the recipientAge of the recipientRegistration date of the recipientName of the recipientNeeded quantity of bloodRecipient’s blood groupId of the registration staffDistrict idBlood processing manager’s idTypeintcharintdatevarcharintvarcharintintintThe relationship with Registration staff and Blood Recipient is 1 to many. That’s whyprimary key of Registration staff is used as a foreign key in Blood Recipient.The relationship with District and Blood Recipient is 1 to many. That’s why primary keyof District is used as a foreign key in Blood Recipient.The relationship with Blood Processing Manager and Blood Recipient is 1 to many.That’s why primary key of Blood Sample is used as a foreign key in Blood Recipient.Blood SampleTable 2.2.5Attribute Nameb groupsample noStatusdrecog id (fk)bm id (fk)DescriptionBlood group of the sampleSample identification numberStatus of the blood sampleDisease Recognizer’s idBlood processing manager’s idTypevarcharintvarcharintintThe relationship with Disease Recognizer and Blood Sample is 1 to many. That’s whyprimary key of Disease Recognizer is used as a foreign key in Blood Sample.The relationship with Blood processing manager and Blood Sample is 1 to many. That’swhy primary key of Blood processing manager is used as a foreign key in Blood Sample.9

Disease RecognizerTable 2.2.6Attribute Namedrecog iddrecog nameSexDescriptionDisease Recognizer’s idDisease Recognizer’s nameDisease Recognizer’s sexTypeIntvarcharcharBlood Processing ManagerTable 2.2.7Attribute Namebm idbm nameSexDescriptionBlood processing manager’s idBlood processing manager’s nameBlood processing manager’s sexTypeintvarcharcharHospitalTable 2.2.8Attribute NameHidhb qntyhb grpHNameDescriptionHospital’s idNeeded quantity of blood in a hospitalNeeded blood groupHospital’s NameTypeintintvarcharvarchardis id(fk)District’s idintbm id(fk)Blood processing manager’s idintThe relationship with District and Hospital is 1 to many. That’s why primary key ofDistrict is used as a foreign key in Hospital.The relationship with Blood processing manager and Hospital is 1 to many. That’s whyprimary key of Blood processing manager is used as a foreign key in Hospital.10

2.3 2ormalizationBoyce Codd introduced a number of ‘normal forms’ (1970- 1972). They are principlesthat can hold for a given relation or not.The formal definition of Normalization is: it is the sequence of steps by which arelational database model is both created and improved upon. The sequence of stepsinvolved in the normalization process is called normal forms. Essentially, normal formsapplied during a process of normalization allow creation of a relational database model asa step-by-step progression.Normal Forms:First Normal Form (1NF): A relation is in first normal form if it contains only simple,atomic values for attributes, no sets; that is, if attributes do not have sub This relation is in first normal form because attributes do not have sub attributes.Second Normal Form (2NF): A relation is in second normal form, if it is in 1NF andevery non-primary key attribute is fully functionally dependent on the primary key of therelation.Example:Relation :( A, B, C, D){A} {B}{A} {C}{A} {D}It is in 2NF because it is in 1NF and every non-primary key attribute is fully functionallydependent on the primary key of the relation. [2]Third Normal Form (3NF): A relation is in third normal form, if it is in 2NF and nonon-primary key attribute is transitively dependent on the primary key.Example:Relation :( A, B, C, D, E )11

{A, B} {C}{A, B} {E}This relation is in third normal form because it is in 2NF and no non-primary keyattribute is transitively dependent on the primary key.Boyce-Codd Normal Form (BCNF): A relation is in BCNF, if for every full functionaldependency X Y holds: X is a candidate key. If part of primary key is fullyfunctionally dependent on non primary key, BCNF violation occurs.Example:Relation :( A, B, C, D ){A, B} {C, D}{C} {A}In 1NF, 2NF, 3NF, but not in BCNF. Because part of primary key, A is fully functionallydependent on non- primary key C. We have to split the original relation.( A, B, D ), ( C, A ).Now in BCNF.Advantages of normalization:i. Many unnecessary redundancies are avoided.ii. Anomalies with input, deletion and updates can be avoided.iii. Fully normalized, relations tend to need less space than if not normalized.Disadvantages of normalization:i.Normalization splits entities and relationships into many relations, thus making themharder to understand.ii. Queries become more complex because they have to involve more relations.iii. Response times are longer because of a higher number of joins in the queries. [2]2ormalization of Blood Bank database:1. Donor (dId, dName, sex, age, dreg date, rs id, dis id, db grp){dId} {dName} (functional dependency exists, because two different dNames do notcorrespond to the same dId).12

{dId} {sex} (functional dependency exists).{dId} {age} (functional dependency exists).{dId} {dreg} date (functional dependency exists).{dId} {rs id} (functional dependency exists).{dId} {dis id} (functional dependency exists).{dId} {db grp} (functional dependency exists).The relation is in 1NF because its attributes do not have sub attributes.The relation is in second normal form, as it is in 1NF and every non-primary key attributeis fully functionally dependent on the primary key of the relation.The relation is in third normal form, as it is in 2NF and no non-primary key attribute istransitively dependent on the primary key.No part of primary key is fully functionally dependent on non-primary key. So, therelation is in BCNF2. District (dis id , dis name){dis id} {dis name}The relation is in 1NF.The relation is in second normal form.The relation is in third normal form.The relation is in BCNF.3. Registration staff (rs id, rs name, sex){rs id} {rs name} (functional dependency exists).{rs id} {sex} (functional dependency exists).The relation is in 1NF.The relation is in second normal form.The relation is in third normal form.The relation is in BCNF.4. Blood recipient (rId, sex, age, r regdate, rName, b qnty, rb grp, rs id, dis id, bm id){rId} {sex} (functional dependency exists).{rId} {age} (functional dependency exists).{rId} {r regdate} (functional dependency exists).{rId} {rName} (functional dependency exists).{rId} {b qnty} (functional dependency exists).13

{rId} {rb grp} (functional dependency exists).{rId} {rs id} (functional dependency exists).{rId} {dis id} (functional dependency exists).{rId} {bm id} (functional dependency exists).The relation is in 1NF.The relation is in second normal form.The relation is in third normal form.The relation is in BCNF.5. Blood Sample ( b group, sample no, status, drecog id, bm id ){b group,sample no} {status} (functional dependency exists).{b group,sample no} {drecog id} (functional dependency exists).{b group,sample no} {bm id} (functional dependency exists).The relation is in 1NF.The relation is in second normal form.The relation is in third normal form.The relation is in BCNF.6. Disease recognizer ( drecog id, drecog name, sex ){drecog id} {drecog name}.{drecog id} {sex} (functional dependency exists).The relation is in 1NF.The relation is in second normal form.The relation is in third normal form.The relation is in BCNF.7. Blood processing manager ( bm id, bm name, sex){bm id} {bm name}{bm id} {sex} (functional dependency exists)The relation is in 1NF.The relation is in second normal form.The relation is in third normal form.The relation is in BCNF.8. Hospital ( hId, hb qnty, hb grp ,dis id, bm id, hName )14

{hId} {hName, dis id, bm id}{hId, hb grp} hb qnty (functional dependency exists)The relation is in 1NF.The relation is not in second normal form, as it is in 1NF but not every non-primary keyattribute is fully functionally dependent on the primary key of the relation. So, we have tosplit the relation.Hospital 1(hId, hName,dis id,bm id).Hospital 2(hId, hb grp, hb qnty)Now it is in 2NF.The relation is in third normal form.The relation is in BCNF.2.4 Tables with sample values after 2ormalizationBlood-Processing-ManagerTable 2.4.1Blood RecipientTable 2.4.215

Blood SampleTable 2.4.3Disease-RecognizerTable 2.4.4DistrictTable 2.4.5DonorTable 2.4.616

Hospital 1Table 2.4.7Hospital 2Table 2.4.8Registration StaffTable 2.4.917

Implementation in SQL Server-Figure 2.4.1 : Blood-Bank Management Database System2.5 Queries1. Show the uncontaminated blood samples verified by Dr. Shila.2. Show the donors having the blood groups that are required by recipients living in thesame district. Show the recipient details also.18

3. Show the donor and recipients details having same blood group registered by staffTania on the same date.4. Show detailed information of the recipients and hospitals of Dhaka city who need A blood group.5. Find out the recipient name who took A type blood from the donor(also showdonor’s name) and both’s district ids must be ‘10’.19

6. Find out donor name, id who is registered by registration staff id ’104’ and show theregistration staff’s name also.7. List the name, age and id of donor who is registered by registration staff ‘Bushra’ orwho have B blood group8. Find out all information about hospital 2 which has not been processed by the bloodprocessing manager having id ‘6’.20

Part-IIChapter 3Text File ExportationWe will be exporting text files from the database tables in order to mine data. Miningdata in our thesis means matching a string in question with that of text files and if found,will return the index of exactly matched string or the approximately matched string alongwith the edit distance depending on the algorithms.3.1 Exporting Text Files From Database TablesMS SQL Server Export Table to Text File Software is used to export text files fromdatabase tables.Subsequent steps are:1. At first, we connected our database with the local server.Figure 3.1.121

Figure 3.1.22. Then we opened tables one by one.Figure 3.1.322

3. We selected the table names with their respective fields to export as text files.Figure 3.1.4Figure 3.1.523

4. We selected a particular delimiter to separate each field of the tables.Figure 3.1.65. At last, we got our required text files after clicking finish.Figure 3.1.724

3.2 Elementary program in C to read strings of text filesThe C code which will write the output of the “MS SQL Server Export Table to TextFile” software into another file in an arranged formatted way is as follows.3.2.1 Procedure –a) First open the source fileb) Create a destination filec) Count total new lines of source coded) Then repeat the file pointer to the starting position of the source file.e) Because, the 1st line or row contains the name of the columns so if we will find outeach one’s length with spaces.f) These lengths will be stored in an array.g) Now when we start to print the values under each column then each value of a row willmaintain its place because, total length of each column will be compared with it and willbe printed in arranged way.h) Before printing the last line the loop will be stopped. A new loop will be started toprint the last line following previous procedure.i) This is done only for avoiding the infinite looping.j) After finishing the job the files will be closed.3.2.2 The code –#include stdio.h #include conio.h void main(){FILE *fr,*fw;char ch;int i 0,k 0,word 0,c 0,count 0,num 0,j 0,ar[20],line 0;fr fopen("BR.TXT","r");// source file will open if not then show the messageif(fr NULL)// “can’t open source file”{ puts("Can't open source file");exit();}fw fopen("BRN.TXT","w"); //destination file where output will be writtenif(fw NULL){puts("Can't open source file");25

fclose(fr);exit();}while(1){ch fgetc(fr);if(ch EOF){k k-1;// how many lines in the code will be counted herebreak;}if(ch '\n')k ;}rewind(fr);while(1){ch fgetc(fr);//code for reading the source file and writing on destination fileif(ch EOF)break;if(i 0){// for 1st row onlywhile(ch! '\n'){fputc(ch,fw);word ;ch fgetc(fr);if(ch ','){ch fgetc(fr);// works only for 1st row.fputc(' ',fw);// each column’s length will be stored in an arrayword ;// so that values can be inserted just under their own// columnfputc(' ',fw);word ;fputc(' ',fw);word ;fputc(' ',fw);word ;ar[c] word;c ;word 0;}}26

if(ch '\n'){i ;fputc(ch,fw);ar[c] word;c 0;word 0;//when newline is found in 1st row//that means array has got its values for each column}}else{count ;while(ch! ','&& ch! '\n'){fputc(ch,fw);count ;// after 1st row other rows will be printed sequentiallych fgetc(fr);// under their own columnsif(ch ','){if(ar[c] count){fputc(' ',fw);fputc(' ',fw);count 0;c ;}if(ar[c] count){num ar[c]-count;for(j 0;j num;j )fputc(' ',fw);count 0;c ;}if(ar[c] count){fputc(' ',fw);count 0;c ;}}if(ch '\n'&& line k){line ;fputc(ch,fw);count 0;c 0;if(line k){ch fgetc(fr);// that is only for last row27

while(ch! EOF){fputc(ch,fw);ch fgetc(fr);count ;// it is done in individual way following same// procedure to stop infinite looping of codeif(ch ','){ch fgetc(fr);if(ar[c] count){fputc(' ',fw);count 0;c ;}if(ar[c] count){num ar[c]-count;for(j 0;j num;j )fputc(' ',fw);count 0;c ;}if(ar[c] count){fputc(' ',fw);count 0;c ;}}}if(ch );}28

3.3 Converted Text FilesThe tables which are converted into text files using software are given below----Figure 3.3.1Figure 3.3.2Figure 3.3.329

Figure 3.3.4Figure 3.3.5Figure 3.3.630

Figure 3.3.7Figure 3.3.8Figure 3.3.931

Final Outputs of Formatted Text FilesFigure 3.3.10Figure 3.3.11Figure 3.3.1232

Figure 3.3.13Figure 3.3.14Figure 3.3.1533

Figure 3.3.16Figure 3.3.17Figure 3.3.1834

3.4 Selecting a text file for miningIn this code, a text file’s strings are taken as input strings and the output will also be atext file. The total column is counted from the first row of the text file.If the first row’s first string (indicating 1st column) is numbered as “1”, then thecorresponding strings (in other rows) of this column will be numbered as—4, 7, 10,13 .etc. The second string is numbered as “2” (2nd column) and corresponding stringsof this column will be numbered as—5, 8, 11, 14 .etc. In this way, each of the stringswill be numbered following their own columns and the column (with corresponding datastrings) that is, the 2nd column (with 50 strings) which we want as output will show at theruntime. The output file will be created of this column where its correspondingvalues/strings will be written. The output file named as Staff name will eventually bemined using exact and approximate string matching algorithms. Similarly other databasetables which have been converted to text files in the previous section could also bemined.Input file:35

36

Output file:The code for selecting a text file for mining:#include stdio.h #include conio.h #include string.h void main(){struct readf{char mm[20];}r[170];int i 0,j 1,count 0,k 0,l,flag 0,fl 0,point 0,first 0,col 0,p 0;char c,newa[20],fname[10],dname[20],column[20];FILE *fp;FILE *fm;37

clrscr();printf("\n Give the input filename : ");gets(fname);printf("\n Give the output filename : ");gets(dname);fp fopen(fname,"r");if(fp NULL)printf("\n Error in opening file!!");for(k 0;k 20;k )newa[k] '\0';while(1){c fgetc(fp);if(point 1){point 2;col j-1;}if(c EOF){strcpy(r[j].mm,newa);j ;i 0;count 0;break;}if(c! ' '&&c! '\n'){flag 0;fl 0;newa[i] c;i ;count ;}if(c ' '&&fl! 1){fl 1;strcpy(r[j].mm,newa);j ;i 0;for(k 0;k 20;k )newa[k] '\0';38

count 0;}if(c '\n'&&flag! 1){flag 1;strcpy(r[j].mm,newa);j ;for(k 0;k 20;k )newa[k] '\0';i 0;count 0;if(point 0)point 1;p ;}}//while endsfclose(fp);printf("\n\n");for(i 1;i col;i )printf("\t\t %s",r[i].mm);printf("\n Give the column name : ");gets(column);for(i 1;i col;i ){if(strcmp(r[i].mm,column) 0){point i;break;}}printf("\n %s----position : %d",r[i].mm,point);fm fopen(dname,"w");for(i point;i j-2-(col*3);){if(flag 1){i i col;}if(flag 0)i point col;flag 1;fprintf(fm,"%s ",r[i].mm);first ;if(first 2){first 0;c '\n';39

fputc(c,fm);}}i i col;fprintf(fm,"%s ",r[i].mm);i i col 1;fprintf(fm,"%s",r[i].mm);fclose(fm);getch();}40

Chapter 4Exact String Matching AlgorithmsWhy are we researching on exact string matching algorithms?A given string is called a pattern and a text contains a number of strings. Using exactstring matching algorithms, we can match a string in question with the strings in the textfile and get the index of exactly matched strings from the latter. In short, we are trying tomine the database text file by matching a query string exactly with a string in the text file.4.1 Research on Exact String Matching Algorithms4.1.1 Brute Force AlgorithmMain features no preprocessing phase;constant extra space neede

A database is the single most useful environment in which to store data and an ideal tool to manage and manipulate that data. The benefits of a well-structured database are infinite, with increased efficiency and time-saving benefits. Our team’s interest is centered around this area. At the very start, we create a