10.4 Connecting to Databases
When a program deals with a small amount of data or deals with data whose structure is simple, but the program needs a persistent existence for the data, a text file or a DBM database works well. We already know from our discussion in Section 10.1 that a DBM database is essentially very simple. A DBM database contains a set of key-value pairs, and can be associated easily with a Perl hash. In Perl, a key for a hash is a scalar. A value in a hash is a scalar as well. Thus, we can compare a DBM database to a simple data table containing two columns: a key column and a value column. The content of each column is a scalar, in particular, a string. No other data types can be used. For example, if we want the key or the value to correspond to a date in a specific format (say, yyyymmdd) or a number in a specific format (say, floating point number
with 10 digits including 4 after the point), we cannot really enforce the requirement unless we write the data validation code ourselves. If we want the data table to contain five columns for an individual, say, a last name, a first name, a social security number, a phone number and an address, we cannot do so in a straight-forward manner. If we want the data to contain several related tables (say, a table containing identifying information for individual customers, and another table containing orders customers have placed at a store), we cannot do so straight-forwardly either. In addition, a DBM database becomes inefficient beyond a few tens of thousands of key-value pairs. Additionally, a DBM database can be queried in only one way: given a key, find the value. Sometimes, we may prefer to have more flexibility in querying the database. For example, we may want to find all individuals that have an associated value larger than a certain threshold.
When we are interested in storing a large amount of data with rich structure, and also want to be able to query the data in a flexible and efficient manner, we need to use a database. There are several models that can be used to build a database. Building a database system is very complex, expensive and time-consuming, and only a large organization or a dedicated group of individuals can do so in practice. The relational model is the dominant model at this time. The relational model was introduced by Codd [Cod70] and is based on the Entity-Relationship (ER) Model of data representation, introduced by Chen [Che76]. Simply speaking, the relational model looks at data as one or more tables of data. Each piece of data in a table in a row or a record. A record or row
has several fields or columns. One field is required to be unique across all records in a particular table. This field is called the key for the table. The key can be a combination of several fields as well. If there are several tables, they are not all independent; they are usually related using so-called foreign keys. However, this text book is not the place if one is not familiar with databases. One is referred to a book such as [NE02] for a tutorial introduction to databases.
There are many database servers available in the market for use by a programmer. The prices vary from being free to hundreds of thousands of dollars depending on the sophistication of the system, and the philosophy of the manufacturers. Freely available database systems include the widely-used MySQL system, and also the Postgresql system. There are many commercial database systems. These include Access and SQLServer from Microsoft, Inc., various versions of Oracle databases from Oracle, Inc., FileMaker Pro, originally from Apple Computers, Inc., and now from FileMaker, Inc., etc.
A database consists of the data and the engine that accesses and manipulates the data based on user or administrator requests, usually called queries, sometimes called statements. All relational databases use a language called the Structured Query Language (SQL) to query the actual storehouse of data. The database stores all the tables and for a program to be able to use the knowledge stored, the program has to be able to access and manipulate the contents of the database. All databases provide user interfaces in which a query can be typed and the response obtained from the database. SQL queries can be embedded in programs in other languages as well. SQL is supposed to be a standardized language, but every database manufacturer seems to add a little extra to it, making it somewhat incompatible from system to system. Thus, in practice there are many
dialects of SQL. In this chapter, we primarily look at the MySQL database and the version of SQL that it uses.
First, to work with a database, the database server must be running and must be accessible to the program. The database does not have to run on the same machine as the program that tries to access it. The database server may run on another machine across the Internet. However, wherever the database program may run, the Perl program must be able to make a connection to it, send it SQL queries, and receive responses in return from it. Usually, any database that is properly set up, specifies a set of authorized users, the machines they can login from, a password for each user-machine combination, and the type of access allowed for each user-machine combination. It is possible for a database to have very fine-grained authentication and access mechanisms. This is especially necessary if the database is accessible over the Internet with many different types of users having access to it, or if the database contains sensitive
information crucial to an organization’s functioning.
To access a specific database from a Perl program, we need two modules: DBI.pm and a driver module. The DBI.pm module is a database access module. It defines a consistent database access interface irrespective of the actual database used behind the scene. By a consistent interface, we mean a set of methods, variables and conventions that all program have access to and use.
An actual database has a database engine or a database server. We need a driver specific to the actual database to be able to communicate with the actual database. Thus, we need to install the appropriate driver modules. The programming interface provided by the DBI.pm module actually talks to low-level driver modules that do most of the work.
The DBI.pm module sends the method calls from the Perl program to the appropriate database-specific driver, if necessary. The DBI.pm module also loads the appropriate drivers. The drivers have private interface functions that implement certain specific functions the specific database server or engine. For example, for connecting to a MySQL database server, we need to install the mysql.pm driver module. It is a DBD module. One way to install the mysql.pm module is to download and install the Msql-Mysql-modules. If one wants to install just the mysql.pm module, one should indicate during the installation process that the MSQL-specific module msql.pm is not needed.
Two terms are frequently used in programs that use the DBI.pm module.
• A database handle: A database handle is used to open a connection with an actual database. Later, it is used in all steps required in query processing. We will discuss the steps in detail later.
• A statement handle: To execute an SQL statement from within a Perl program, several steps are necessary. As a result, it helps to have an SQL statement. We discuss the steps later.
We see how these two handles are used in detail in the rest of the section. There are other handles that can be created and used within the Perl program as well.