Request PDF on ResearchGate | GIS: A computing perspective | This book has an emphasis on spatial modelling. The book is intended for readers from any. Request PDF | On Jan 1, , Michael Worboys and others published GIS: a computing perspective. Geographic information systems (GISs) are computer-based information systems that are able to capture, model, store, retrieve, share, manipu- late, analyze.
|Language:||English, Spanish, French|
|Distribution:||Free* [*Register to download]|
x l, | File type: PDF | pages. | GIS A Computing Perspective Second Edition | File size: Mb. By Michael F. Worboys, Matt Duckham: GIS: A . GIS: A Computing Perspective GIS: A Computing PerspectiveMICHAEL F. WORBOYS Department of Computer Science, Universi. GIS: A Computing Perspective, Second Edition, provides a full, up-to-date overview of GIS, both Geographic DownloadPDF MB.
Preview Unable to display preview. Download preview PDF. References Agrawala M. Google Scholar Ahas R. Futures, 37, — Google Scholar Becker C. Qualitative spatial representation and reasoning: an overview, Fundamental Informaticae, 46 1—2 , 1— Google Scholar Csinger A.
Google Scholar Dey A. Frank and W. Kuhn Eds. Google Scholar Goodchild M. Google Scholar Hightower J. Google Scholar Hu H. Google Scholar Jameson A. Google Scholar Kaasinen E.
An annual anal Embed Size px. Start on. Show related SlideShares at end. WordPress Shortcode. Published in: Full Name Comment goes here. Are you sure you want to Yes No. Be the first to like this. No Downloads. Views Total views. Actions Shares. Embeds 0 No embeds. No notes for slide. Worboys Full eBook 1. Worboys Full eBook 2. Book details Author: Michael F. Worboys Pages: The CPU is itself made up of several subcomponents: The CPU executes machine instructions by fetching data into special registers and then performing computer arithmetic upon them.
Computer arithmetic is commonly performed on two types of numbers fixed point and floating point and is handled by the ALU.
The various components of the CPU are synchronized by the control unit, with the aid of the system clock. Each instruction cycle performing the processing required for a single machine instruction is actually a sequence of micro-operations that generate control signals.
These signals originate in the control unit and are received by the ALU, registers and other infrastructure of the CPU. One instruction cycle is divided into one or more machine cycles, defined to be the time required for bus access see next section.
Thus, the number of machine cycles that make up an instruction cycle depends upon the number of communications that the CPU makes with external devices to execute that instruction. Such connectivity is provided by the bus, which is a communication channel between two or more internal modules. A bus may be shared by several devices, and a message sent on the bus by one module may be received by any of the other modules attached to it.
However, only one module may send a message along the bus at any one time. In order to synchronize events in a computer system, a system bus may contain a clock line that transmits a regular sequence of signals based upon an electronic clock. As the computer is essentially a digital machine, the implication is that the data must be converted to strings of bits in an appropriate format and then directed to the correct places in the computer system.
The performance of this task is accomplished by means of data capture, transfer and input devices. Input devices can feed data directly into the CPU. We may distinguish between primary data data captured directly from the application world, for example remotely sensed data or river pollutant data directly from source , and secondary data data to be captured from devices that are already storing the data in another form, for example paperbased maps.
Table 1. The numerical and textual data capture and input devices are standard for all information systems. A more detailed description of two primary spatial data capture devices follows: Captures digital data by means of sensors on satellite or aircraft that provide measurements of reflectances or images of portions of the Earth. Remotely sensed data are usually raster in structure. They are characterized by high data volumes, high resolution but low-level of abstraction in the data model.
Thus, a river high-level concept is captured as a set of pixels within a range of reflectances low-level. Allow capture of terrestrial position and vehicle tracking, using a network of navigation satellites. The data are structured as sequences of points, that is in vector format. For secondary data capture, usually from existing paper-based maps, either scanners or digitizers are the traditional means of capturing two-dimensional data and stereoplotters for three dimensions.
Convert an analogue data source usually a printed map, in the case of GIS into a digital dataset. The data are structured as a raster. The process is to measure the characteristics of light that is transmitted or reflected from the gridded surface of the data source. The principle is precisely the same as for remote sensing. In this case, the sensor is much closer to the data source. However, the characteristic of a large volume of low-level data is still operative. Provide a means of converting an analogue spatial data source to a digital dataset with a vector structure.
Digitizing is a manual process although it is possible to provide semi-automatic or even automatic tools using a digitizing tablet upon which the source is placed.
A cursor is moved about the surface of the data source and its position is recorded as it goes, along with non-spatial attributes. For example, the cursor may be moved along the boundary of an administrative district on a paperbased map of political regions, a sequence of points recorded, and the whole point sequence labelled with a boundary identifier. In this way a vector structure is established, containing nodes, arcs, and polygonal areas enclosed by collections of arcs.
When the data are captured, they are transferred to the database within the GIS by file transfer devices along computer communication channels. The main problem is to get the data into a format that is acceptable, even understandable, to the particular proprietary GIS that is the recipient.
To this end, data transfer standards have been agreed see Appendix 2. From an application perspective, any storage device is measured in three dimensions: Persistent data, which are to be retained beyond immediate processing requirements, must also be stored on media that are non-volatile, where data are not lost by natural decay or when the power is removed. Devices for holding data are called storage media Figure 1.
Primary storage: Storage that can be directly manipulated by the CPU. Primary storage is currently based on semiconductor technology: Secondary storage: Examples include magnetic and optical disks. Tertiary storage: Storage devices that are off-line to the computer in normal activity, and used to archive very large amounts of information that could not be efficiently handled by secondary storage.
An example is a tape library of satellite imagery. Primary storage may be divided into CPU local register memory and other main memory core. Also the control unit will require some internal memory. Such register memory is the fastest and most expensive of all types, Figure 1. Primary storage is generally volatile, data being lost when the system is in a power-off state. Main memory originally consisted of an array of ferrite cores, and the term core memory is still in use.
However, the technology is now based on semiconductor chips. The first and most common is random access memory RAM , a misnomer because all main memory is randomly accessible i.
RAM is volatile, thus requiring a continuous power supply, and capable of both read and write access. This hard-wiring takes place during the ROM chip manufacture. ROM is used to hold data that are fixed and often required, such as systems programs and code for often-used functions for example, graphics functions in a GIS.
Data from ROM are thus permanently in main memory. ROM has the disadvantage apart from only allowing read access of being expensive to manufacture and therefore only feasible for manufacture in large quantities.
Thus they are suitable for specialized chips required in small quantities. Writing of new data is performed electrically, as with PROM.
However, its flexibility may give it the advantage in the future. Critical to the day-to-day performance of a GIS is the balance between primary and secondary storage. With current technology and dataset sizes, it is impractical to handle all the data required by a GIS in primary storage. Therefore, the structuring of data files on secondary storage devices has been the key factor. Much effort and ingenuity has been spent by computer scientists devising suitable data structures to ensure good performance with spatial data.
To understand the issues that arise with such data structures, it is worth examining in some detail the properties of the typical and prevalent secondary storage technology, the magnetic disk.
A magnetic disk has a capacity, which is the number of bytes that may be stored on it at any one time. Current floppy disks for microcomputers may hold about a megabyte Mbyte, bytes of data.
Hard disks for micros usually have a capacity of a few Table 1. Large disks packs for standard computers typically have capacities measured in gigabytes Gbyte, bytes. The technology changes rapidly and disk capacities are growing all the time. A magnetic disk is literally a disk coated with magnetic material, having a central hole so that it may be rotated on a spindle.
Disks may be single-sided holding data on only one side , double-sided holding data on both sides and may come in disk packs for storage of larger data volumes. The magnetic disk drive is often cited as the prime example of a direct access device that is, storage where the time to access a data item is independent of its position on the storage medium.
However, this is not quite the case.
Although access times are tiny compared with a device such as a tape drive sequential access , there is still some dependence upon position. The time taken to move the head seek-time is the over-riding factor in access to data on a disk. Thus there is an advantage in physically structuring the data on the disk or disk pack so as to minimize this time as far as possible. The physical structure of spatial data on a disk is a question that has received attention from the computer science community.
The block is the unit of data access from secondary storage. The magnetic disk is important because it can store and provide direct access to large amounts of data. Another non-primary storage device is the magnetic tape medium.
Although tapes can store large amounts of data more cheaply than a disk, they only allow sequential read and write access. Thus, tapes are more useful as tertiary storage devices.
Other storage media based upon optical technology are becoming more important: Access methods provide the means of getting data items into and out of computer storage. They come in four main types: These types are distinguished as follows: Sequential access: Made in a specific sequence. The magnetic tape device is an example of a sequential access device.
For example, if census records are stored on Figure 1. As with playing an audio cassette, to access a track, we have to start from where the tape is after the previous use and wind to the required position s.
Access times vary with the amount of wind required. Direct access: To a specific neighbourhood, based upon a unique physical address. The magnetic disk drive is an example of a direct access device. After reaching the neighbourhood, a sequential search or other mechanism leads to the precise block.
When locating a bar of music on a compact disk, we may go directly to the track and then scan through the track for the bar we want. Access times vary but are much less variable than for sequential access.
Random access: To the precise location, based upon a unique physical address. Main memory is randomly accessible. The access time is invariant of position of data on the medium. Associative access: To the precise location, based upon the contents of the data cell. All cells are accessed simultaneously and their data matched against a given pattern.
Data items in cells for which there is a match are retrieved. Some cache memories use associative access. Textual and numerical data may be output using devices such as printers and visual display units VDUs. Output from a spatial information system often comes in the form of maps. Maps may be output using plotters and printers. If permanent records are not required, then VDUs with highpowered graphics functionality may be used to view output.
As hardware becomes cheaper, the general movement has been to provide high-level programming languages that have more features, are more complex and have greater power more processing per line of program.
However, the traditional CPU has supported low-level processing operations. The main features of RISC architecture are: Today, there are several proprietary machines available that are based upon RISC architecture.
Parallel processing The traditional, so-called von Neumann architecture of a computer assumes sequential processing, where each machine instruction is executed in turn before control passes to the next instruction. As processors have become smaller and cheaper, the possibility of allowing the simultaneous, parallel execution of machine instructions on separate processors within the same computer is being explored.
The goal is to improve performance by the simultaneous execution of independent operations and reliability by the simultaneous execution of the same operation. These developments may be particularly beneficial to GIS, where the processing demands are extreme and the opportunities for parallel execution natural, for example in some raster operations where each pixel or group of pixels may be processed independently.
There is a growing body of work in the application of parallel architectures to GIS, and the bibliography at the end of the chapter provides some starting points. The next generation of systems will provide an integrated service to users, who may be physically distant from each other, giving them the opportunity to interact with a large number of services provided by different computers but perceived as a seamless whole. A distributed system is a collection of autonomous computers linked in a network, together with software that will support integration.
Such systems began to emerge in the early s following the construction of high-speed local area networks. Examples of distributed computing are: Communication between computer users via electronic mail.
A remote login to a distant computer for a specialized application e. A transfer of files between computers using the file transfer protocol FTP. A hypertext document with embedded links to services around the globe in a distributed network.
To understand GIS in this distributed context, it is important to take stock of some of the key general notions in this area. This section provides a general background, beginning with an emphasis on hardware and the principles of information transmission, continuing with a consideration of network technology and concluding with some applications of distributed computing.
Such signals may be either continuous analog where the signal strength varies smoothly over time, or discrete digital where the signal jumps between a given finite number almost always two of levels. Examples of these signal types are shown in Figure 1. Some signals may be periodic, where the pattern of the signal repeats cyclically.
A particularly simple and important signal type is the sine wave, shown approximately in Figure 1. A binary digital signal or just digital signal transmits bit-streams of data in the form of a wave with higher intensities corresponding to 1 and lower intensities corresponding to 0. Thus may be represented by the idealized digital signal shown in Figure 1.
Any electromagnetic signal may be expressed as a combination of sine waves. The rate of repetition of a sine wave is called its frequency and is measured in hertz Hz cycles per second. Thus, any periodic wave may be expressed as a combination of frequencies of its component sine waves. The spectrum of a signal is the range of frequencies that it contains and the bandwidth of the signal is the width of its spectrum.
There is a clear relationship between the bandwidth of a signal and the amount of information that it can carry. The higher the bandwidth, the greater the amount of information.
Signal encoding: Numbers, whether integer or rational, may be represented as a bitstream using base two arithmetic. Numbers and text may be represented as a string of characters, which are then encoded as binary sequences. Precise details of the code are not needed for this text and may be found in many introductory texts on computing.
Images may be encoded in several different ways for the purpose of communications, amongst which are: Analog Facsimile FAX: An image is coded as a sequence of horizontal lines, each of which has a continuously varying intensity.
Digital Facsimile and Raster-scan Graphics: An image is coded as an array of pixels, each pixel representing either black or white or a colour code if using colour graphics. Vector Graphics: An image is decomposed into a set of points, lines and other geometric objects rectangles, circles, etc. Each object is specified by giving its position, size and other determining characteristics.
Transmission terminology and media Digital data are transmitted as a bit-stream using an electromagnetic signal as a carrier. We have noted that the information carrying capacity of a signal is determined by its bandwidth the width of the spectrum of frequencies that it contains.
Data transmission is a communication between transmitter and receiver. Only in the ideal world is it the case that the signal transmitted is equal to the signal received: The weakening of a signal as it travels through the transmission medium. It is important that attenuation does not reduce the strength of the signal so that background noise becomes significant: Caused by transmission along cables.
Delay of a component of a signal depends upon the frequency. As with attenuation, this variation may cause distortion. Delay places a limit upon data transmission rates though cables. Extraneous electromagnetic energy in and around the medium during transmission. Noise has a variety of causes, some related to the properties of the transmission media and others to environmental effects such as other signals and electromagnetic storms.
As with attenuation and delay, noise can distort the signal, particularly if the signal strength is of the same order of magnitude as the noise. The solution is to provide a strong signal and error detection and correction mechanisms.
Attenuation, delay and noise place limitations upon the information carrying capacity of a transmission channel. These limitations are measured using bandwidth, data rate in bits per second , noise and error rate.
Transmission media come in a variety of types and have varying channel characteristics. Media types may be broadly separated into guided and unguided. Guided media restrict the signal to a predetermined physical path, such as a cable or fibre. Unguided media place no such restriction: With a guided medium, both transmitter and receiver are directly connected to the medium. Examples are twistedpair of two insulated copper wires , coaxial cable hollow outer cylindrical conductor surrounding single wire conductor and optical fibre flexible and thin glass or plastic optical conductor.
Optical fibre fibre-optic is the newest and will surely become the dominant guided transmission medium. With unguided media, both transmission and reception are by means of antennae. A useful frequency band for unguided transmission is microwave 2—40 GHz. Terrestrial microwave is an alternative to coaxial cable for long-distance telecommunications.
It requires line-of sight transmission and therefore long distances require intermediate stations but fewer than coaxial cable. Other applications include short-range data links maybe between buildings. Satellite microwave uses earthorbital satellites to reflect signals, thus avoiding the problem of line-of-sight in terrestrial microwave. Satellite microwave has a huge and growing number of applications, from commercial television to business data communications.
This cooperation is made possible through the use of computer networks.
A computer network is a collection of computers, called stations, each attached to a node. The nodes are communications devices acting as transmitters and receivers, capable of sending data to and receiving data from one or more other nodes in the network. Computer networks are customarily divided into two types, wide-area networks and local-area networks.
A wide-area network WAN is a network of computers spread over a large area. Such an area may be as small as a city or as large as the globe. WANs transmit data at relatively low speeds between stations linked together by dedicated node computers called packet switches. The packet switches route messages packets between stations.
This routing process makes transmission between stations slower than local-area networks, where direct transmission takes place. A local-area network LAN is a network of computers spread over a small area. Such an area may be a single building or group of buildings.
LANs transmit data between nodes at relatively high speeds using a transmission medium such as optical fibre or coaxial cable. Their main feature is direct communications between all nodes in the network with the consequence that no routing of messages is required. Most computerized work environments are connected as a local-area network. High-speed networks It is almost impossible to keep up-to-date in this fast moving area of computing.
At present, high-speed networks are developing that take advantage of transmission technologies such as ATM Asynchronous Transfer Mode using optical fibre and satellite microwave transmission media. ATM currently allows data transfer at rates approaching 1 gigabit per second. The optical fibre cabling of cities will allow the transmission of video, voice and bulk data city-wide, with typical transmission times latencies of less than one millisecond. OSI is a layered model, each of the seven layers dealing with a separate component or service required in distributed computing.
The seven layers are given in Table 1. Suppose that the computer is connected to a single network, for example a local-area network. A specific protocol, depending upon the nature of the LAN and setting out the logic of the network connection, will be required at each of the three lowest OSI layers.
Session and transport layer protocols will be provided to give reliable data transfer and dialogue control between our computer and others on the network. File transfer, electronic mail and other application facilities are provided by the top layer.
Apart from the detail, the important advance that OSI provides is the standardization of interfaces for inter-computer communication. This makes possible the introduction of new software and hardware into a networked system, provided that there is conformance to the OSI standards. OSI allows us to select hardware, software and services for a networked system from a variety of vendors.
Client-server computing Client-server computing is a form of distributed computing in which some stations act as servers and some as clients.
A server holds data and services available for transmission to other stations, clients, in the network. The servers are usually higher specification machines, capable of running complex system software. Clients typically may be PCs or workstations. Client stations request information from servers, which they then subject to their own specialized processing requirements. Thus, the network may have servers holding national census data, administrative boundary polygons and medical statistics.
A client may then request census information for a particular region from a server, combine this with medical statistics obtained from a second server and use the boundary data from a third server to conduct a spatial analysis, code for which may be obtained from yet another server.
For GIS applications, client-server computing holds many possibilities. A GIS at present combines both information system, spatial analysis and presentation components: The X-server controls a display, which comprises keyboard, mouse and one or more screens. The X-server usually starts by presenting to the user a login window. The user logs in to begin the X-session and to initiate the X-clients.
The X-server provides the user with a root window, within which other client windows are arranged. The window manager is a special client that provides functionality to arrange and manipulate client windows on the root window. Communications between clients and server take place using the X protocol.
Connection between client and server is established with a connection setup packet message sent by the client, followed by a reply from the server. During connection, the types of messages possible include: The clients act upon the server in terms of resources. Resources are abstract objects, examples of which are windows, fonts, cursors and graphics contexts.
Information about each client may therefore be expressed in terms of attributes associated with its resources, collectively known as the state of the client. Thus, a network allows all connected users to share information, subject to constraints imposed by the network architecture.
The next logical step is to allow networks to share information. This is the concept of an internet. An internet is an interconnected collection of networks called subnetworks , each subnetwork retaining its own individual characteristics but the entire collection appearing as a single network to a user. The technology of an internet requires a further set of components: Used to connect networks with different protocols; operates in the OSI application layer.
A special and important example of internetworking is the Internet, which originated from a research initiative under the auspices of the US Government. The Internet is a fast-growing global collection of networks. At the time of writing summer , the Internet comprises approaching constituent networks, a new network being added to the system about every 10 minutes, and about 20 million people have access to Internet resources such as electronic mail and file transfer.
The Web runs on the Internet with client programs at each member station displaying hypermedia objects. For example, a document with buttons for sounds, images, video and text may be displayed, and pressing a button invokes a call for those data to a different server on the Internet. Thus, the user can trace routes through to new documents and links. It is the hypermedia format that gives the Web its particular flavour. While menus and directory structures are available, hypermedia provides powerful tools for communicating and gaining information.
The Web user interface is the same regardless of client or server, thus insulating users from unnecessary protocol details. In summary, the Web provides: A network with vast and interconnected sources of information on every conceivable topic. A simple hypertext mark-up language HTML that is used to construct the constituent hypertext documents. Laurini and Thompson , Burrough and Maguire et al. Peuquet and Marble provide an edited collection of papers covering a range of issues for GIS.
Goodchild summarizes GIS functionality for environmental applications. The inspiration for some of the example Potteries applications was provided by Phillips Overview books on general database technology are Date, ; Elmasri and Navathe, Ozsu and Valduriez and Bell and Grimson specialize in distributed databases.
Distributed GIS are the subject of an overview by Laurini Software design in the context of GIS is discussed in Marble, Mayo and Flanagan et al. Generalization is covered in the book by McMaster and Shea and the chapter by Muller The classic book on computer graphics is by Foley et al. Luse discusses image representations, colour models, compression techniques and standard formats. A collection of research papers on multimedia is edited by Herzner and Kappe Visualization is treated in the edited collection of Hearnshaw and Unwin and the chapter by Buttenfield A general overview of hardware is provided by Stallings , and performance considerations for GIS with an emphasis on parallel architectures is discussed by Gittings et al.
Hopkins et al. Computer communications are described in the texts by Stallings and van Slyke and Coulouris et al.
An understanding of GIS technology cannot be achieved without some knowledge of the principles of databases. Many existing GIS are built upon general purpose relational databases: This chapter introduces the reader to the main principles of databases. The general database approach is discussed in section 2.
At present, almost all databases are relational, therefore the relational model and its major interaction language SQL are described in detail in section 2. The reader is introduced to database design principles in section 2.
Object-oriented databases provide extensions of relational functionality, particularly at the modelling level, that could be very useful for GIS and these are described in section 2. In the bad old days, the almost universal use of a computer was to convert one dataset into another by means of a large and complex transformation process typically a FORTRAN program.
For example, we might wish to apply a transportation model to datasets of city population sizes and road distances between cities in order to predict average annual traffic flows. The inputs are datasets of city populations and road distances; Figure 2.
This approach to computation is shown in Figure 2. The alternative offered by the database approach is shown in Figure 2. In this case, the computer acts as a useful repository of data, by facilitating its deposit, storage and retrieval.
It also allows the data to be modified and analysed while it is in the store. Data owners and depositors must have confidence that the data will not be used in unauthorized ways security and that the system has fail-safe mechanisms in case of unforeseen events such as power failure reliability. Both depositors and data users must be assured that as far as possible the data are correct integrity. There should be sufficient flexibility to give different classes of users different types of access to the store user views.
Not all users will be concerned how the database works and should not be exposed to low-level database mechanisms independence.
Data retrievers will need a flexible method for finding out what is in the store metadata support and for retrieving it according to their requirements and skills human-database interaction. The database interface should be sufficiently flexible to respond differently to both single-time users with unpredictable and varied requirements, and regular users with little variation in their requirements.
Data should be retrieved as quickly as possible performance. It should be possible for users to link pieces of information together in the database to get added value relational database.
Many users may wish to use the store, maybe even the same data, at the same time concurrency and this needs to be controlled. Data stores may need to communicate with other stores for access to pieces of information not in their local holding distributed systems. All this needs to be managed by a complex piece of software database management system.
So a synthesis must be achieved. In the early days of computers the emphasis was on the processes in computations but, particularly for corporate applications, weaknesses resulting from the imbalance between data and process became apparent. We illustrate this with a fictitious example. The owner-manager, a computer hobbyist, decided to apply computer technology to help with some aspects of the management of the restaurant.
She began by writing a suite of programs to handle the menu. Menu items were stored as records in a menu file, programs were written to allow the file to be modified items deleted, inserted and updated and for the menu to be printed each day. Figure 2.
The menu file is held in the operating system and accessed when required by the programs. Time passed, the menu system was successful and the owner gained the confidence to extend the system to stock management.
A similarly structured stock system was set up, consisting of stock file, and programs to modify the stock file and periodically print a stock report. Once stock and menu details were in the system, it became apparent that changes in stock costs influenced menu prices.
A program was written to use stock costs to price the menu. Stage two of the Nutty Nuggets system is shown in Figure 2.
The system continued to grow with new files for supplier and customer details added. But as the system became enlarged, some problems began to emerge, including: Loss of integrity: Linkages between programs and files became complex. The programs made the relationships between the data in the files: The development of software was becoming complex and costly and errors crept in: