Security techniques for the management of data
Data management - according to DAMA refers to the "development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise".
Security techniques - according to DAMA's DMBOK framework data security management covers the following four areas
- Data access
- Data erasure
- Data privacy
- Data security
Therefore, security techniques for the management of data covers the development and execution of architectures, policies, practices and procedures that cover data access, data erasure, data privacy and data security. The dot point in the syllabus covers two techniques, Disaster Recovery Plans and Audit Trails, as examples only. There may be an expectation to know about other techniques, too.
Here are some other techniques that exist listed under the DMBOK framework headings. We will not cover these in class.
Disaster Recovery Plan
A disaster recovery plan (DRP) is a documented process or set of procedures to recover and protect a business IT infrastructure in the event of a disaster. The DRP specifies the procedure(s) an organisation will follow in the event of a disaster. The DRP outlines comprehensively the actions to be taken before, during and after a disaster. The disaster could be natural, environmental or man-made.
Planning for disasters
Given organisations increasing dependency on IT infrastructure to run their operations, it is vital for businesses to put in place a disaster recovery plan. The main objective of a disaster recovery plan is to minimise downtime and data loss.
Preparing for a disaster
- Complete a risk assessment/business impact analysis
- Identify critical systems and networks
- Prioritise the recovery time objectives
- Identify and implement preventative controls
- Develop an IT contingency plan by delineating the steps to restart, reconfigure and recover the critical systems and networks
- Plan testing, training and exercising
- List the supplier contacts and system experts that will be required for a smooth recovery
Immediately before a disaster
If there is prior knowledge that a disaster might be impending, there may be steps that can be taken prior to the disaster arriving that can be done to minimise the disruption and/or damage to the IT infrastructure. These steps will be built into the DRP. These may include:
- Monitor weather/fire/flood etc advisories
- Power off electrical equipment
- Notify employees/customers of business closure
- Attempt to suppress fire in early stages
- Determine potential damage and prepare
- Shut off utilities
During a disaster
Depending on the type of disaster there may be a number of actions to take.
- Evacuate building/or take cover
- Contact emergency services
After a disaster
Depending on the type of disaster there may be a number of actions to take.
- Assess damage
- Determine impact on equipment and facilities
- Follow procedures in the DRP to bring services back online (restart, reconfigure and/or recover systems/networks)
Reasons to implement a DRP
Measuring Success of a DRP
Recovery Point Objective (RPO)
The maximum age of a backup before it ceases to be useful. If you can afford to lose a day’s worth of data in a given system, you set an RPO of 24 hours.
Recovery Time Objective (RTO)
The maximum amount of time that should be allowed to elapse before the backup is implemented and normal services are resumed.
An audit trail (also called an audit log) is a chronological set of records that provide documentary evidence of the sequence of activities in a system, application or by a user. Audit trails are used to determine the cause of irregular activity in a system or application. The record of system activities enables the reconstruction and examination of a sequence of events.
Information stored in an audit log
The information stored in audit log should provide enough detail so that the person looking into the irregular activity can reconstruct the events that occurred so that they can determine the cause of the problem. Once the cause has been found, the issue can be resolved so that it will not happen again.
At a minimum, the information stored in an audit log should log for each event:
- What the event was
- What user, system or application launched the event (this may include IP address and device type)
- The date and time the event took place
Types of audit trails
System-level audit trail
A system-level audit trail would exist at a whole system level. For example, on a computer this would be at the Operating Systems level. For example, on a Mac, launch system.log from the Console.
A system-level audit trail may include:
- any attempt to log on
- the applications that the user tried to invoke
- information that is not specifically security related such as system operations and network performance
Application-level audit trail
System-level audit trails may not be able to track and log events within applications or may not be able to provide the level of detail needed by application or data owners. An application-level audit would exist within an application itself or would implement the protocols to use the system-level audit trail. For example, Microsoft Access would have an implementation of logging which can be used through the Windows operating system's error log.
An application-level audit trail may include:
- monitor and log user activities - e.g. data files opened and closed, read/edit/delete records or fields
- before and after of each modified record
User-level audit trail
User audit trails can usually log:
- all commands directly initiated by the user
- all identification and authentication attempts
- files and resources accessed (including options, parameters)
Types of audit logs
An event log contains records describing system events, application events, or user events. An event log should include sufficient information to establish what events occurred and who (or what) caused them. 
Keystroke monitoring refers to the process used to view or record both the keystrokes entered by a computer user and the computer's response during an interactive session. Keystroke monitoring is usually considered a special case of audit trails. 
- Protecting audit trail data
- Review of audit trail data
- Tools for audit trail analysis
- Cost consideration
Reasons to use an audit log
- Individual accountability
- Reconstruction of events
- Intrusion detection
- Problem analysis
Types of backup techniques and archiving of data
Backup - A copy of data that is used to restore the original in the event that data is lost or damaged. If a company experiences data loss due to hardware failure, human error or disaster, a backup can be used to quickly restore data.
Archiving - a collection of historical records that are kept for long-term retention and used for future reference. Typically archived data is not actively used.
There are a variety of different techniques that can be applied to backup or archive data. This dot point from the syllabus mentions a number of them listed below.
Difference between backing up and archiving
A data backup is helpful in the event of major data loss, as it allows you to quickly restore data back to its original state in a data loss event. However, backing up large amounts of data can cause the backup infrastructure to run slowly.
File archiving works with data backup to reduce backup costs and reduce the strain of large amounts of data put on your storage infrastructure. The goal of file archiving is not to restore lost data quickly, but to store data and organise it in a way that you can easily search through it to find specific information.
Backups and archiving should not be pitted against each other but instead should be used together to more efficiently store and recover data and reduce backup time and cost. 
When data is lost, a backup helps restore the system to its original state at a particularly timestamp. Data loss may occur in the event of hardware failure, human error or a disaster, so having a backup allows the data to be restored to the state that it was in at the time of being backed up. By backing up data frequently, the difference in data between the time of the data loss and the time of last backup (called the delta) can be reduced, which can help reduce the amount of data that is lost. Sometimes, the delta can be manually repaired if the current system data is available.
Backing up in the cloud
A number of backup services can be now be achieved through the cloud or through hybrid (cloud and on-site server) solutions.
Backup SaaS (Software as a service)
A web-native application hosted at a central location and access through a browser-based interface. Agents residing on the systems to be protected pass data from the primary site to the cloud.
Cloud storage solutions
The data is backed up with software on site (and sometimes stored on hardware on site), and then leverages off-site services or infrastructure (massive data centres). 
The backup and restoration process
Prior to backing up
- Ensure enough space is available
- Set up a backup schedule
- Ensure you have a backup that can successfully restore
- Test the backup
- Typically automated through the backup schedule
- A manual backup may be performed when required
Restoring a backup
- Make a copy of the current system
- Start the backup restore
- Monitor for errors
- Upon completion, test the system works as expected
Typically, backups would be stored both on data servers on and off site. The reason for this is that in the event of a disaster where the on-site servers are destroyed, their is a backup located off-site that can be used to restore the system. As backups consist of all the data within a system at a given point in time, they take up a lot of storage space. The amount of space taken up with depend on the amount of data in the system, but also the type of backup technique that is used.
How to choose the 'right' backup/archive combination
Choosing the right backup and archive techniques will help you to best balance the storage and infrastructure requirements and the need to be able to restore services in a quick and easy manner. There is no 'right' combination that will work for all organisations and this will ultimately depend upon the needs of each individual organisation. A combination of both backup and archive is recommended to try and reduce costs for infrastructure and storage, while allowing data to be restored quickly and easily in the event of data loss.
A full backup is a method of backup where all the files and folders selected for the backup will be backed up. It is commonly used as an initial backup followed with subsequent incremental or differential backups. After several incremental or differential backups, it is common to start over with a fresh full backup again.
- Restores are fast and easy to manage as the entire list of files and folders are in one backup set.
- Easy to maintain and restore different versions.
- Backups can take very long as each file is backed up again every time the full backup is run.
- Consumes the most storage space compared to incremental and differential backups. The exact same files are stored repeatedly resulting in inefficient use of storage.
Example of a full backup
You setup a full backup job or task to be done every night from Monday to Friday. Assume you do your initial backup on Monday, this first backup will contain your entire list of files and folders in your backup job. On Tuesday, at the next backup run, the entire list of files and folders will be copied again. On Wednesday, the entire list of files and folders will be copied again and the cycle continues like this. At each backup run, all files designated in the backup job will be backed up again. This includes files and folders that have not changed.
Differential backups fall in the middle between full backups and incremental backups. A differential backup is a backup of all changes made since the last full backup. With differential backups, one full backup is done first and subsequent backup runs are the changes made since the last full backup. The result is a much faster backup than a full backup for each backup run. Storage space used is less than a full backup but more then with incremental backups. Restores are slower than with a full backup but usually faster then with incremental backups
- Much faster backups then full backups
- More efficient use of storage space then full backups since only files changed since the last full backup will be copied on each differential backup run.
- Faster restores than incremental backups
- Backups are slower then incremental backups
- Not as efficient use of storage space as compared to incremental backups. All files added or edited after the initial full backup will be duplicated again with each subsequent differential backup.
- Restores are slower than with full backups.
- Restores are a little more complicated then full backups but simpler than incremental backups. Only the full backup set and the last differential backup are needed to perform a restore.
Example of a differential backup
You setup a differential backup job or task to be done every night from Monday to Friday. Assume you perform your first backup on Monday. This first backup will be a full backup since you haven’t done any backups prior to this. On Tuesday, the differential backup will only backup the files that have changed since Monday and any new files added to the backup folders. On Wednesday, the files changed and files added since Monday’s full backup will be copied again. While Wednesday’s backup does not include the files from the first full backup, it still contains the files backed up on Tuesday.
An incremental backup is a backup of all changes made since the last backup. The last backup can be a full backup or simply the last incremental backup. With incremental backups, one full backup is done first and subsequent backup runs are just the changed files and new files added since the last backup.
- Much faster backups.
- Efficient use of storage space as files are not duplicated. Much less storage space used compared to running full backups and even differential backups.
- Restores are slower than with a full backup and differential backups.
- Restores are a little more complicated. All backup sets (first full backup and all incremental backups) are needed to perform a restore.
Example of an incremental backup
You setup an incremental backup job or task to be done every night from Monday to Friday. Assume you perform your first backup on Monday. This first backup will be a full backup since you haven’t done any backups prior to this. On Tuesday, the incremental backup will only backup the files that have changed since Monday and any new files added to the backup folders. On Wednesday only the changes and new files since Tuesday's backup will be copied. The cycle continues this way.
A daily backup copies all the files selected that have been modified on the day the daily backup is performed. A daily backup examines the modification date stored with each folder's directory to determine which files should be backed up.
- Requires little space to store
- Can be applied to specialised scenarios
- Not as reliable as the other forms of backup - there is a logical gap in that if someone is working late at night and modifies a file after the backup takes place it may not be backed up
- Can be difficult to restore
- Daily backups don't reset the archive bit
Example of a daily backup
You setup a daily backup job or task to be done every night from Monday to Friday. Assume you perform your first backup on Monday. This first backup will be a full backup since you haven’t done any backups prior to this. On Tuesday, the backup will only backup the files that have changed or added on Tuesday. On Wednesday, the backup will only backup the files that have changed or added on Wednesday. The cycle continues this way.
Online data storage methods
Online - networked (not necessarily to the internet)
Data storage - storing information in memory for later use
Methods - ways or means
This dot point of the syllabus refers to the way or means of storing data, particularly for an organisation, in a networked manner so that multiple users can access it.
A data warehouse is a large store of data accumulated from a range of sources within a company and used to guide management decisions. It is a system used for reporting and data analysis, and is considered a core component of business intelligence. Data Warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place and are used for creating analytical reports for knowledge workers throughout the enterprise.
Why create a data warehouse?
By combining data sources across the company and integrating them into one place, the integrated data can be analysed across the company to see how data in one area or department might be affecting data in other areas or departments. If a data warehouse did not exist, the data sits in operational silos and would need to be manually integrated to make inferential decisions which can be tedious and time-consuming.
To combine data from multiple sources, a process to cleanse the data needs to take place. Data cleansing refers to the process of detecting and correcting corrupt or inaccurate records from a data set. It is important when warehousing data because data from different sources will need to be mapped together where data may be referenced in different manners in their original source.
For example, a person may be referenced using an email address as a primary key in one database, where as in another database they may be referenced using a user id made from their first and surnames. To bring these two databases together, a suitable primary key must be chosen and then data cleansed and mapped accordingly.
A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data.
Data in the Cloud
Cloud storage is a model of data storage in which the digital data is stored in logical pools, the physical storage spans multiple servers (and often locations), and the physical environment is typically owned and managed by a hosting company. These cloud storage providers are responsible for keeping the data available and accessible, and the physical environment protected and running. People and organizations buy or lease storage capacity from the providers to store user, organization, or application data.
Examples of Cloud Storage
- Google Drive
- Microsoft OneDrive
- Apple iCloud
Data in the Cloud
Many applications will save data in the cloud on their own servers for their own applications. Rather than storing data on the device itself, data is stored 'in the cloud' so it can be accessed from any device through the application. Facebook is a prime example where users' posts, photos and videos are stored on Facebook servers and can be accessed through any device via the Facebook application.
Purpose of Data Mining
Data mining is the computing process of discovering patterns in large data sets. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. An example that utilise data mining are Google AdWords which extracts personal information from your account details and Google search history to personalise marketing towards your likes/dislikes, hobbies and so forth. Beware that it also is a buzzword and is frequently applied to any form of large-scale data or information processing that is down by humans, too, however this is not the true sense of the word.
Data-mining and 'data mining'
As mentioned, data-mining gets used as a buzzword. The formal usage of the word data-mining is a computational process (an algorithm) that runs in the background to elicit information from large datasets (big data). However, the term is also used when humans perform analysis on large sets of data. It may be useful to clarify this in an exam situation and cover both bases.
The term big data refers to extremely large datasets that are usually analysed computationally (they are too big to analyse without computers). When the term data mining is used, it is normally associated with big data. As the Internet of Things (IoT) becomes more prominent and more data is collected more often, data mining will help humans make meaning from the sheer amount of information.
Why data mine?
Computers are really good and performing repetitive tasks on large data sets. Programmers can take advantage of this situation and provide an algorithm that takes advantage of computers to trawl through large amounts of data to provide meaningful solutions for end-users of that information. Data mining of big data can help us make meaning by finding patterns, trends and association of, amongst other things, human behaviour and interactions.
Data mining and privacy
The fact is that a number of companies now have access to and have the ability to data mine your personal information. Large gatherers of data, such as social media sites and search engines, make their profits through access to your data (barely anything is truly free!) through advertising and on-selling of data trends to businesses. While most of this use is legal and outlined in their privacy statements, there are some cases where it may be considered unethical. (See articles below.)
Processing of data considering security of data
A password is a word or string of characters used for user authentication to prove identity or access approval to gain access to a resource (example: an access code is a type of password), which is to be kept secret from those not allowed access.
Creating secure passwords
Tips for creating strong passwords from Google.
A firewall is a network security system that monitors and controls the incoming and outgoing network traffic based on predetermined security rules. A firewall typically establishes a barrier between a trusted, secure internal network and another outside network, such as the Internet, that is assumed not to be secure or trusted. Firewalls are often categorised as either network firewalls or host-based firewalls.
Network firewalls filter traffic between two or more networks; they are either software appliances running on general purpose hardware, or hardware-based firewall computer appliances.
Host-based firewalls prov
ide a layer of software on one host that controls network traffic in and out of that single machine.
Biometrics refers to metrics related to human characteristics. Biometrics authentication (or realistic authentication) is used in computer science as a form of identification and access control. It is also used to identify individuals in groups that are under surveillance.
Biometric identifiers are then distinctive, measurable characteristics used to label and describe individuals. Biometric identifiers are often categorised as physiological versus behavioural characteristics.
Physiological characteristics are related to the shape of the body. Examples include, but are not limited to
- palm veins
- face recognition
- palm print
- hand geometry
- iris recognition
Behavioural characteristics are related to the pattern of behaviour of a person, including but not limited to
- typing rhythm
Anti-virus software (often abbreviated as AV), sometimes known as anti-malware software, is computer software used to prevent, detect and remove malicious software.
Antivirus software was originally developed to detect and remove computer viruses, hence the name. However, with the proliferation of other kinds of malware, antivirus software started to provide protection from other computer threats. In particular, modern antivirus software can protect from:
- malicious browser helper objects (BHOs)
- browser hijackers
- trojan horses
- malicious LSPs
Some products also include protection from other computer threats, such as
- infected and malicious URLs
- scam and phishing attacks
- online identity(privacy)
- online banking attacks
- social engineering techniques
- advanced persistent threat (APT)
- botnet DDoS attacks
Digital signatures use a mathematical technique which validates the authenticity of the message. Digital code (based on public key cryptography) attached to an electronic document to be transmitted by web to verify sender and contents. A validated signature allows a recipient to believe it was created by a known sender.
Digital certificates are like an electronic ‘passport’ which verifies sender from the web and allows receiver to encode a reply, provides a public key which proves ownership. Certificate contains serial number, name of certificate holder, expiration dates, copy of certificate holder’s public key and digital signature of certificate-issuing authority
Encryption is the encoding of messages ensures that only authorised people can read the information, conversion of data to encrypt. The message to be sent via the web is encrypted by applying an encryption algorithm and an encryption key. The message is then decrypted by using the same method but in reverse. This increases the security of the message while in transit.
The concept of user-generated content
User Generated Content can refer to any form of content, such as videos, blogs and stories, published onto the Internet by consumers as opposed to producers; regular people as opposed to businesses. User generated content is usually always made publicly available to everyone online without a cost.
User-generated content is used for a wide range of applications, including problem processing, news, entertainment, advertising, gossip and research. It is an example of the democratization of content production, where, as new technologies have become more accessible, user friendly and affordable to the general public, large numbers of individuals are able to post text, digital photos and digital videos online, with little or no "gatekeepers" or filters.
Advantages and Disadvantages of User-Generated Content
- Usually made so that it is always free of charge
- Usually publicly available
- Can generate vast quantities of data, quickly
- From a business standpoint, a great advantage to use in various advertising campaigns.
- Less onerous to generate
- Allows the consumers to feel a sense of involvement
- More easily maintained and up-to-date
- Quality of data is not always accurate or reliable as it is not vetted
- Businesses might generate unwanted public feedback which may tarnish the brand of the business
- Placement of bias
Concept of Hypertext Markup Language
Concept of Web 2.0 and 3.0
Concepts of Web 2.0
Web 2.0 can be described in three parts:
- Rich Internet application (RIA) — defines the experience brought from desktop to browser, whether it is "rich" from a graphical point of view or a usability/interactivity or features point of view.
- Web-oriented architecture (WOA) — defines how Web 2.0 applications expose their functionality so that other applications can leverage and integrate the functionality providing a set of much richer applications. Examples are feeds, RSS feeds, web services, mashups.
- Social Web — defines how Web 2.0 websites tends to interact much more with the end user and make the end-user an integral part of the website, either by adding her profile, adding comments on content, uploading new content, or adding user-generated content (e.g., personal digital photos).
Features of Web 2.0
- Folksonomy - free classification of information; allows users to collectively classify and find information (e.g. "tagging" of websites, images, videos or links)
- Rich user experience - dynamic content that is responsive to user input (e.g., a user can "click" on an image to enlarge it or find out more information)
- User participation - information flows two ways between site owner and site users by means of evaluation, review, and online commenting. Site users also typically create user-generated content for others to see (e.g., Wikipedia, an online encyclopedia that anyone can write articles for or edit)
- Software as a service (SaaS) - Web 2.0 sites developed APIs to allow automated usage, such as by a Web "app"(software application) or a mashup
- Mass participation - near-universal web access leads to differentiation of concerns, from the traditional Internet user base (who tended to be hackers and computer hobbyists) to a wider variety of users
Features of Web 3.0
- dynamic applications
- computers interpret information
- computers distribute useful information that is tailored
- computers generate own raw data
- semantic searches
- search engines tailor searches to users
- natural language searches
- data mining
Benefits of Web 3.0 over Web 2.0
- enabled more efficient searching
- contributed to improved marketing
- contributed to improved web browsing
- enabled predictive searches
- enabled advertisers to better target profiles
Purpose and Features of Content Management Systems
A content management system (CMS) is a computer application that supports the creation and modification of digital content. It is often used to support multiple users working in a collaborative environment.
Benefits of CMSs
- supports the creation and modification of digital content using a common user interface
- facilitates central management
- enables multiple users
- facilitates a collaborative environment
- facilitates ease of management
Purpose of World Wide Web Consortium (W3C)
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web (abbreviated WWW or W3).
Founded and currently led by Tim Berners-Lee, the consortium is made up of member organisations which maintain full-time staff for the purpose of working together in the development of standards for the World Wide Web. As of 1 April 2017, the World Wide Web Consortium (W3C) has 461 members.
The organisation tries to foster compatibility and agreement among industry members in the adoption of new standards defined by the W3C. Incompatible versions of HTML are offered by different vendors, causing inconsistency in how web pages are displayed. The consortium tries to get all those vendors to implement a set of core principles and components which are chosen by the consortium.
Working draft (WD)
After enough content has been gathered from 'editor drafts' and discussion, it may be published as a working draft (WD) for review by the community. A WD document is the first form of a standard that is publicly available. Commentary by virtually anyone is accepted, though no promises are made with regard to action on any particular element commented upon.
At this stage, the standard document may have significant differences from its final form. As such, anyone who implements WD standards should be ready to significantly modify their implementations as the standard matures.
Candidate recommendation (CR)
A candidate recommendation is a version of a standard that is more mature than the WD. At this point, the group responsible for the standard is satisfied that the standard meets its goal. The purpose of the CR is to elicit aid from the development community as to how implementable the standard is.
The standard document may change further, but at this point, significant features are mostly decided. The design of those features can still change due to feedback from implementors.
Proposed recommendation (PR)
A proposed recommendation is the version of a standard that has passed the prior two levels. The users of the standard provide input. At this stage, the document is submitted to the W3C Advisory Council for final approval.
While this step is important, it rarely causes any significant changes to a standard as it passes to the next phase.
Both candidates and proposals may enter "last call" to signal that any further feedback must be provided.
W3C recommendation (REC)
This is the most mature stage of development. At this point, the standard has undergone extensive review and testing, under both theoretical and practical conditions. The standard is now endorsed by the W3C, indicating its readiness for deployment to the public, and encouraging more widespread support among implementors and authors.
Recommendations can sometimes be implemented incorrectly, partially, or not at all, but many standards define two or more levels of conformance that developers must follow if they wish to label their product as W3C-compliant.
Purpose of W3C Conventions
Most W3C work revolves around the standardisation of Web technologies. To accomplish this work, W3C follows processes that promote the development of high-quality standards based on the consensus of the community. W3C processes promote fairness, responsiveness, and progress, all facets of the W3C mission.
W3C continues to evolve to provide the community a productive environment for creating Web standards. W3C standards:
- are created following a consensus-based decision process;
- consider aspects of accessibility, privacy, security, and internationalisation;
- reflect the views of diverse industries and global stakeholders;
- balance speed, fairness, public accountability, and quality;
- benefit from Royalty-Free patent licensing commitments from participants;
- are stable (and W3C seeks to ensure their persistence at the published URI);
- benefit from wide review from groups inside and outside W3C;
- are downloadable at no cost;
- are maintained in a predictable fashion;
- are strengthened through interoperability testing;
Purpose of the Web Design and Applications standard from the W3C standards
"W3C standards define an Open Web Platform for application development that has the unprecedented potential to enable developers to build rich interactive experiences, powered by vast data stores, that are available on any device. Although the boundaries of the platform continue to evolve, industry leaders speak nearly in unison about how HTML5 will be the cornerstone for this platform. But the full strength of the platform relies on many more technologies that W3C and its partners are creating, including CSS, SVG, WOFF, the Semantic Web stack, XML, and a variety of APIs.
W3C develops these technical specifications and guidelines through a process designed to maximize consensus about the content of a technical report, to ensure high technical and editorial quality, and to earn endorsement by W3C and the broader community."
Extracted from https://www.w3.org/standards/ on 22 June 2017.
HTML and CSS
Audio and Video
Validation Techniques for Online Forms
When the internet was made readily available it provided a way to collect data from multiple people electronically through online forms which was a big change prior to having to collect data without computers or through a single computer. Originally, HTML forms were quite primitive and had little to no data validation. This meant that the quality of the data that was collected was often invalid. HTML3 Forms now have validation techniques which are implemented using either client-side or server-side scripting.
As the internet evolved, a number of web applications became available which specialise in collecting data through forms, each with their own validation techniques. E.g. Google Forms, Microsoft Forms, Survey Monkey. These tools have built in tools for validating data in forms.
Techniques for validating data in forms
- Data type - e.g. check numerical data is numeric
- Data length - e.g. check number of characters fits expected data type
- Required input - e.g. checks to see if data has been added if required
- Selection from available options (strings) - e.g. multi-choice, checkboxes or drop down list
- Selection from available options (numbers) - e.g. linear scale,
- Select from widget to gain correct format - e.g. date picker, time picker
- Format checking prior to submission - e.g. email address is in ___@___.___ format, value above/below a certain number
- Move to a different section of the form dependent on answer to prior question (skip irrelevant questions)
Analyse Sources for Verifiability, Accuracy and Currency
When looking at sources of information on the internet, you need to think about whether the information you are being provided is reliable. You need to think particularly about whether the source is verifiable, accurate and current to determine whether the source is trustworthy.
- Who is providing the information?
- What do you know about the author and their credentials?
- Are they an expert?
- Can you find out more and contact them?
- Search for author or publisher in search engine. Has the author written several publications on the topic?
- Have other credible people referenced this source?
- Is there a sponsor or affiliation?
- Who is linking to the page?
- Do they take responsibility for the content?
- Websites: Are credible sites linking to this page?
- Is the language free of emotion?
- Does the organization or author suggest there may be bias? Does bias make sense in relations to your argument?
- Is the purpose of the website to inform or to persuade towards a certain agenda?
- Are there ads? Are they trying to make money?
- Why did they write the article?
- Websites: Is the site a content farm? A content farm is a site whose content has been generated by teams of freelancers who write large amounts of low-quality text to raise the site’s search engine rankings.
- Copy and paste a sentence into Google to see if the text can be found elsewhere.
- (Website) Are there links to related sites? Are they organized?
- Are there citations or a bibliography provided? Do they cite their sources?
- Is the data accurate? Can I confirm the accuracy of the data somewhere else?
- Is the source comprehensive? Are there facts that are not presented?
- When was the source last updated?
- Does the source have a date?
- Does the source appear professional?
- Does it seem like current design?
- Was it reproduced? If so, from where? Type a sentence in Google to verify.
- If it was reproduced, was it done so with permission? Copyright/disclaimer included?
This information was extracted from - http://www.easybib.com/guides/students/writing-guide/ii-research/c-evaluating-sources-for-credibility/ on 5th July 2017