Introduction and Issues

What is a distributed system?

“A collection of autonomous computers linked by a network with software designed to produce an integrated facility”

“A collection of independent computers that appear to the users of the system as a single computer”


Distributed systems

  • Department computing cluster

  • Corporate systems

  • Cloud systems (e.g. Google, Microsoft, etc.)

Application examples

  • Email

  • News

  • Multimedia information systems - video conferencing

  • Airline reservation system

  • Banking system

  • File downloads (BitTorrent)

  • Messaging



A Distributed System


A Centralized Multi-User System


An Application Example

Advantages of Distributed Systems vs. Centralized


  • Commodity microprocessors have better price/performance than mainframes


  • Collective power of large number of systems is potentially infinite

Geographic and Responsibility Distribution

  • Can provide better speed in geographic regions by not going over slower transoceanic cables


  • One machine’s failure need not bring down the system


  • Computers and software can be added incrementally

Advantages of Distributed Systems vs. Standalones

Data Sharing

  • Multiple users can access common database, data files

Device/Resource Sharing

  • e.g., printers, servers, other CPUs…


  • Communication with other systems…


  • Spread workload to different & most appropriate systems


  • Add resources and software as needed

Disadvantages of Distributed Systems


  • Little software exists compared to PCs (for example) but the situation is improving with the cloud.


  • Still slow and can cause other problems (e.g., when disconnected)


  • Data may be accessed by unauthorized users through network interfaces


  • Data may be accessed securely but without the owner’s consent (significant issue in modern systems)

Key Characteristics

  • Support for resource sharing

  • Openness

  • Concurrency

  • Scalability

  • Fault Tolerance (Reliability)

  • Transparency

Resource Sharing

Share hardware, software, data and information

Hardware Devices

  • printers, disks, memory, sensors

Software Sharing

  • compilers, libraries, toolkits, computational kernels


  • databases, files

Resources Must be Managed


Resources Must Be Managed


Client-Server Model for Resource Sharing


Determines whether the system can be extended in various ways without disrupting existing system and services

Hardware extensions (adding peripherals, memory, communication interfaces..)

Software extensions
  • Operating System features

  • Communication protocols

Mainly achieved using published interfaces, standardization

Open Distributed Systems

Are characterized by the fact that their key interfaces are published

Based on the provision of a uniform interprocess communication mechanism and published interfaces for access to shared resources

Can be constructed from heterogeneous hardware and software.


  • In a single system several processes are interleaved

  • In distributed systems - there are many systems with one or more processors

  • Many users simultaneously invoke commands or applications (e.g., Netscape..)

  • Many server processes run concurrently, each responding to different client request, e.g., File Server


Opportunities for Concurrency


Scale of system

  • Few PCs servers –> Dept level systems –> Local area network –> Internetworked systems —>Wide area network…

  • Ideally - system and applications software should not (need to) change as systems scales

Scalability depends on all aspects

  • Hardware

  • Software

  • Networks

  • Storage

Fault Tolerance

  • Ability to operate under failure(s) - possibly at a degraded performance level

  • Two Approaches - Hardware redundancy - use of redundant components - Software Recovery - design of programs to recover

  • In distributed systems - servers can be replicated - databases may be replicated - software recovery involves the design so that state of permanent data can be recovered

  • Distributed systems, in general, provide a high(er) degree of availability


Transparency “is the concealment from the user of the separation of components of a distributed system so that the system is perceived as a whole”.


  • Access Transparency - enables local and remote objects to be accessed using identical operations (e g., read file..)

  • Location transparency - location of resources is hidden

  • Migration transparency - resources can move without changing names

  • Replication Transparency - users cannot tell how many copies exist

  • Concurrency Transparency - multiple users can hare resources automatically

  • Parallelism Transparency - activities can happen in parallel without user knowing about it

  • Failure Transparency - concealment of faults

Design Issues

  • Openness

  • Resource Sharing

  • Concurrency

  • Scalability

  • Fault-Tolerance

  • Transparency

  • High-Performance

Issues arising from Distributed Systems

  • Naming - How to uniquely identify resources

  • Communication - How to exchange data and information reliably with good performance

  • Software Structure - How to make software open, extensible, scalable, with high-performance

  • Workload Allocation - Where to perform computations and various services

  • Consistency Maintenance - How to keep consistency at a reasonable cost


  • A resource must have a name (or identifier) for access

  • Name: Can be interpreted by user, e.g., a file name

  • Identifier - Interpreted by programs, e.g., port number

Naming - Name Resolution

  • “resolved” when it is translated into a form to be used to invoke an action on the resource

  • Usually a communication identified PLUS other attributes

  • E.g., Internet communication id

    • host id:port no

    • also known as “IP address:port no”

    • 192:130.228.6:8000

  • Name resolution may involve several translation steps

Naming - Design Considerations

  • Name space for each type of resource

    • e.g., files, ports, printers, etc.

  • Must be resolvable to communication Ids

    • typically achieved by names and their translation in a “name service”

    • You must have come across “DNS” when using the WWW!!

  • Frequently accessed resources, e.g., files are resolved by resource manager for efficiency

  • Hierarchical Name Space - each part is resolved relative to current context, e.g., file names in UNIX


Communication is an essential part of distributed systems - e.g., clients and servers must communicate for request and response

Communication normally involved - transfer of data from sender to receiver - synchronization among processes

Communication accomplished by message passing

Synchronous or blocking - sender waits for receiver to execute a receive operation

Asynchronous or non-blocking

Types of Communication

  • Client-Server

  • Group Multicast

  • Function Shipping

  • Performance of distributed systems depends critically on communication performance

  • We will study the software components involved in communication

Client-Server Communication

  • Client sends request to server process

  • Server executes the request

  • Server transmits a reply and data, e.g., file servers, web server…


Client-Server Communication

Client-Server Communication

  • Message Passing Operations

    • send

    • receive

  • Remote Procedure Call (RPC)

    • hides communication behind procedure call abstraction

    • e.g., read(fp,buffer,….)

    • Files reside with the server, thus there will be communication between client and server to satisfy this request

Group Multicast

  • A very important primitive for distributed systems

  • Target of a message is a group of processes

    • e.g., chat room, I sending a message to class list, video conference

  • Where is multicast useful?

    • Locating objects - client multicasts a message to many servers; server that can satisfy request responds

    • Fault-tolerance - more than one server does a job; even if one fails, results still available

    • Multiple updates

  • Hardware support may or may not be available

    • if no hardware support, each recipient is sent a message


Group Multicast

Software Structure

  • In a centralized system, O/S manages resources and provides essential services

  • Basic resource management

    • memory allocation and protection

    • process creation and processor scheduling

    • peripheral device handling

  • User and application services

    • user authentication and access control (e.g., login)

    • file management and access facilities

    • clock facilities

Distributed System Software Structure

  • It must be easy to add new services (flexibility, extensibility, openness requirements)

  • Kernel is normally restricted to

    • memory allocation

    • process creation and scheduling

    • interposes communication

    • peripheral device handling

  • E.g., Microkernels - represent light weight O/S, most services provided as applications on top of microkernels


Distributed System Software Structure

Consistency Management

  • When do consistency problems arise?

    • concurrency

    • sharing data

    • caching

  • Why cache data?

    • for performance, scalability

  • How?

    • Subsequent requests (many of them) need not go over the NETWORK to SERVERS

    • better utilized servers, network and better response

  • Caching is normally transparent, but creates consistency problems


  • Suppose your program (pseudocode) adds numbers stored in a file as follows (assume each number is 4 bytes:

    for I= 1, 1000
           tmp = read next number from file
           sum = sum + tmp
    end for
  • With no caching, each read will go over the network, which will send a new 4 byte number. Assuming 1 millisecond (ms) to get a number, requres a total of 1s to get all of the numbers.

  • With caching, assuming 1000 byte pages, 249 of the 250 reads will be local requests (from the cache).


  • Update consistency

    • when multiple processes access and update data concurrently

    • effect should be such that all processes sharing data see the same values (consistent image)

    • E.g., sharing data in a database

  • Replication consistency

    • when data replicated and once process updates it

    • All other processes should see the updated data immediately

    • e.g., replicated files, electronic bulletin board

  • Cache consistency

    • When data (normally at different levels of granularity, such as pages, disk blocks, files…) is cached and updates by one process, it must be invalidated or updated by others

    • When and how depends on the consistency models used

Workload Allocation

  • In distributed systems many resources (e.g., other workstations, servers etc.) may be available for “computing”

  • Capacity and size of memory of a workstation or server may determine what applications may are able to run

  • Parts of applications may be run on different workstations for parallelism (e.g., compiling different files of the same program)

  • Some workstations or servers may have special hardware to do certain types of applications fast (e.g., video compression)

  • Idle workstations may be utilized for better performance and utilization

Processor Pool Model

In a processor pool model, processes are allocated to processors for their lifetime (e.g the Amoeba research O/S supports this concept).


Processor Pool Model


Quality of Service (a.k.a. QoS) refers to performance and other service expectations of a client or an application.

  • Performance

  • Reliability and availability

  • security

Examples where this is important.

  • Voice over IP (VOIP) and telephony

  • Video (e.g. Netflix and friends)