In response to: Enterprise Report Management (the other ERM)
Have any other countries set a target date to stop using
In UK news today; cheques (or Checks for American readers) will be
phased out by October 2018 due to the payment method being in
"terminal decline", but only if adequate alternatives are
Banks and credit providers have been investing in chips which allow
a customer to pay when the chip is pushed against a sensor, known
as contactless technology. Using a mobile phone to pay is another
Full article - http://news.bbc.co.uk/2/hi/business/8414341.stm
Hello, and welcome to this inaugural post with the aim of discussing Enterprise Report Management (ERM) related topics. Please look elsewhere for discussions about enterprise risk management or enhanced remote mirroring.
ERM is not new, the technology has been around for over 20 years with products like IBM Content Manager OnDemand, IBM FileNet COLD, IBM FileNet Report Manager, and IBM Report Management and Distribution System.
During this time many organizations have woven ERM applications into the backbone of their businesses to manage the storage and access of formatted high volume computer output and reports in support of customer service and, more recently, customer self service.
Other applications include online check storage and retrieval. If your internet banking application allows you to view your checks online, chances are they are being stored in an IBM Content Manager OnDemand system.
Historically ERM has been viewed as a standalone application. But within the past 3-4 years, ERM products have been increasingly integrated with other ECM products to support content, records and business process management applications. Not surprisingly, leading analysts now track ERM as a subcomponent of the Enterprise Content Management (ECM) market.
I look forward to discussing the use of ERM within the broader ECM community and beyond. Here’s looking forward to the next 20 years.
Although I have been involved in document capture for over 20 years, it was not until Datacap joined forces with IBM in 2010 that we started to meet regularly with large banks to help them address their massive mortgage processing challenges. Even given all the things that I had learned over the years about high-volume document capture, I have been surprised just how many nuances and special considerations that there are when it comes time to scan a mortgage.
Are you considering scanning and advanced document capture in your mortgage business (or are you just interested in learning more capture tricks-of-the-trade)? If so, then here's my list of the two most important ways that mortgage document capture is special:
1) Document = Batch
Most document capture applications are batch oriented. Why? Because it is almost always more efficient to scan a number of documents all at once (a "batch") versus one at a time. It is also a very useful simplification technique to reduce the number of "things" to track by grouping them into a batch, for example, if a batch consists of 50 documents, then there is a 50-to-1 reduction in 'things' to track.
There are some situations, however, where each document is its own batch. For example, this is often the case when the capture system reads from faxes. Typically each transmission is read into its own batch, and the sender is typically sending one document. Bank branch batch capture (described here) is another good example, where a customer hands over a document to a branch officer and that officer scans that document as a “batch.”
But mortgages are different. Depending on how you count documents, a mortgage packet of 200 or 250 pages may consist of 15 or 20 fairly generic document types up to 50 to 75 very specific doc types. In other words, the one meta-document, the "mortgage," is made up of many different individual documents, e.g. the loan agreement, proof of employment, liens, etc.
2) The primacy of document classification
For many years, advanced document capture was called "forms processing" because the task was to read data off of fixed forms. The archetypical application of forms processing technology was reading tax returns for government revenue departments. There may be different tax forms and schedules, but typically they had bar codes or other easy-to-identify distinguishing marks. (Read the Virginia Department of Taxation case study.)
A mortgage "document" with all its sub-documents is a completely different beast. In the packet there may be some forms with bar codes, but there are many pages that have to be "read" to figure out what they are. The biggest task - by far - when processing a mortgage is to figure out what each of the sub-documents is, and where they end and the next begins. There's no easy one-size-fits-all solution. Doing a good job requires an armory of techniques, some simple and fast like bar code recognition, and some much more sophisticated such as fingerprint matching and textual classification via content analytics.
Of course, mortgage processing shares many challenges and processing characteristics with other large-scale document capture environments. For example, demands for timeliness are high – getting the documents into the repository at the first possible moment in order to make them available for loan servicing or other parts of the organization. And there is a role – in some organizations – for remote capture in a browser or through MFPs of mortgages and/or related follow-on documents.
Mortgage processing is a bit different than many, perhaps most, document capture applications. But if you have any experience in document capture, you know that one of the enduring characteristics of capture is that it is “hard” exactly because each application is different. Even within the category of mortgage processors - e.g. originators, wholesale, correspondent – each have different needs on what document sub-types they want to identify. The knowledge and experience of one implementation can help with the next, but it is never just a matter of plugging in the same application for two different banks and expecting them to both work the same way!
Ready to learn more? Check this out: Intelligent Imaging for Financial services White Paper
Follow me on Twitter @CaptureGuru
Guest post by Scott Blau, WW Director of Document Capture, IBM ECM
In the 1960s, America was riven by “the generation gap:” elders who supported government and traditional family relations vs. the boomer generation of rebels in culture and politics. With the challenges of the 1970s the split started to loose relevance, until in the 1980s the boomers were getting older and more like their parents. The generation gap just faded. Or was it just replaced by a newer gap?
In a recent conversation with a bank CIO, I learned that the bank has a paper problem, but not the one I expected. The bank had installed a scanning system to capture paper documents and turn them into images, yet had been unable to realize the dream of a “paperless office.” With a little research the CIO and his team discovered something that many organizations are now facing: there is a new generation gap, and it is all wrapped up in paper.
Although the bank has a stated policy that discourages printing documents, some loan officers continued to copy each and every loan package (even mortgage applications of 200 or more pages) for their own reference. That in spite of the fact that the loan package is next sent for central scanning and then pushed on an internal portal for reference with handy indexing by document type.
Why were some of the most experienced – and productive – loan officers doing this? At first, the bank thought that the loan officers were simply being careful – ‘I’d better make a copy just in case the document is lost during transit.” But even after the bank installed scanning software in every branch, they were still making personal copies.
After he described this puzzling trend, I told the banker that he didn’t have a technology problem; he had a dependency problem, much like alcoholism or drug addiction in his company. Only in this case, people have become addicted to paper. Forget that an electronic document has been declared just as legal as a paper document. Forget that an electronic document can be retrieved much more quickly than a paper document; that you can copy it and paste it; keep it handy on your desktop, and even annotate it and share it in seconds with a colleague halfway across the country. The problem is you can’t hold it.
Even more interesting though, is the fact that paper addiction seems to be generational. The older and more mature loan officers were the offenders. There is a cut-off – somewhere around 1984 when the post-WWII generation gap supposedly disappeared – after which anyone born has no use for paper documents. The newer generation that has grown up with computers in school - using calculators instead of slide rules, and using word processors instead of typewriters - and their relationship to a sheet of paper is different from the previous generation.
The previous generation, which I fall into, required reams of paper to get through a school day. We grew up writing term papers in either in long hand on carefully typing, and re-typing, them. We calculated algebra equations on scratch pads. When we started our professional lives, there were secretaries in typing pools, clerks whose job it was to wheel around the office delivering interoffice mail, and miles of aisles in the basement filled with file folders.
We got so used to being able to hold a document in our hands that we became dependent on it, like a 3 year old cuddling a teddy bear in bed. And now many in my generation can’t do without it.
So I suggested to the banker that she could either wait 20 years for the young generation to replace the aging paper addicts – and who knows what the next generation gap may bring – or begin an awareness campaign around the advantages of electronic documents over paper and wean people off their dependency. But the irony was not lost that – after 25 years of hearing about the paperless office, that, here it is, finally within reach, and the last obstacle is simply people being unwilling to give up the comfort of holding a document in their hands.
Guest blog by Alan Horton-Bentley ECM WW Industry Marketing Manager - Banking & Financial Markets
Modern banking has improved leaps and
bounds when it comes to extending a variety of services to customers- multiple
access channels, a wide variety of products and services and 24/7 access to
information and help- making banking for customers simple and easy. On the
other side of the counter, inside the bank things have become very complex; in
order to satisfy this ever increasing customer expectation and competitive
Not so long ago, the process of
opening a new bank account or for that matter executing most banking
transactions was a simple matter of a customer visiting a branch location, filling
out a form or two and they were done.
Today, however, even the basic
functions of account opening and loan processing are much more complex. Banks
have to make seamless provisions for the multiple channels for account opening,
the wide variety of account products, to meet regulatory requirements and counter
Loan origination and processing is
also much more complex, it includes all kinds of customer profiling and
assessments to perform, new regulation such as QRM (qualified residential
mortgage) requiring the lender to validate the borrower’s ability to repay the
loan - resulting in a growing number of documents, more stringent information
validation transforming the primary business processes into complex customer
More customer information, new data
types delivered through a growing number of channels makes it difficult to
capture, classify and assimilate into actionable content when the customer is
There is no argument, leveraging increased customer information in
real time will have a positive impact on credit risk management, fraud
interdiction, revenue growth and compliance—but because financial institutions
are inundated with both structured and unstructured data, they are being
overwhelmed with information and have outstripped traditional front office
In order to remain competitive and drive efficiency in business
processes banking institutions need to know which business functions have grown
in complexities so as to warrant taking a new approach: managing these
complex processes as a “case” not as a
To know more attend the IBM
Case Manager and IBM Forms Deliver for Union Bank and ELG-2844 Improving
Information Economics with Defensible Disposal at BNY Mellon sessions at Information
On Demand 2012.
These two of over 700 exciting sessions offered at Information On
Demand 2012. Don’t forget to register before August 31 to
save $300 off your registration fee.
If you already registered to attend, why not build a sample agenda?
It’s a simple tool that allows you to search by industry, program, track or
Guest Blog post by Julie Vaccaro, Offering Manager IBM Content Classification
Everything in our life is categorized and classified in some
Ask 4 people in one household “where is the proper place to
store the toothpaste?” and you will likely get 4 different answers, including “on
the counter”, “in the toothbrush holder”, “under the sink” and “in a drawer”. This
may work well for a household environment, since every person probably has
their own “instance” of a toothpaste tube. But, what if this is a shared
toothpaste tube, that everyone needs access to? Where is the right place to
store it so that each person can get to it when they need it?
This may seems like a simplistic
analogy, but think about these questions. What if you walked into the Library
of Congress and there was no Dewey Decimal System? What if you went into the
hardware store and the items were not organized by their department or use,
such as Plumbing, Electrical, Paint, etc.? How would you ever find anything?
Now think about your business and all of its unstructured
content. Where do you store content so that anyone who needs it can access it,
use it, govern it and analyze it?
Individuals make classification judgments every day. I might
think it best to categorize all resumes into a single category called “Human
Resources Resumes” and store them all together. Another person, from the Human
Resources department, may believe that you should have a category for each
skill set, such as Marketing Resumes, Development Resumes, Janitorial Resumes,
and the like.
Content should be classified and organized such that it is
accessible, so that you can find it when you need it. Content needs to be
usable so that it is available when business decisions are made, either through
manual or automated processes. Content must be governed so that a business
complies with local, state, federal and business mandates. And finally, content
needs to been analyzed and understood to realize its full value.
Properly organizing content is like building a good
foundation. You need to build a house or some other structure on a good
foundation. When you do that the building of the structure becomes easier,
lasts longer and is easier to change later. If you don’t build a strong
foundation, it does not necessarily mean the structure will collapse, but it
will likely cause problem down the road.
The Bottom Line: To start extracting value out of content, a clear Classification strategy is a must.
See what's possible in Content Classification in your industry. IBM's largest EXPO invites
you to experience products, services and solutions in action.
attending the conference or not, you have a unique opportunity via
Social Media to get involved at Information On Demand 2011. Whether you
tweet, blog, share photos or videos - get involved today and add your voice to the conversation
using the official Information On Demand 2011 Social Media Aggregator (SMA)
This site provides real-time updates of all social activity surrounding
the conference, including tweets, blog posts, event photos and video. Join today
After you join, here are specific examples of how to get involved:
- Stay informed by visiting the SMA site frequently to listen to the conversation, and enjoy the photos and videos.
- Visit other ECM social media channels below to stay informed, as well.
Twitter, send a tweet along with the #ibmecm hashtag, and share a
question, comment, or thought related to Information On Demand 2011,
ECM, or our industry. Or share via one of the other ECM social media
- Share this blog post and ECM social media channels with others, encouraging them to join the conversation.
ECM Social Media Channels
- Using Twitter, reply to specific tweets from others that interest you; engage in a tweet conversation with them.
- Reply to a question, comment, or thought posted on one of the ECM social media channels below.
| eConnection Partner Blog
If you have any questions or comments, please let me know.@EricVonheim
Still finalizing your plans for Connect 2013? We've got a
few sessions that I think will interest you – be sure to add these to your
calendar! These are excellent opportunities for you to pose your questions to
our subject matter and industry experts, along with some IBM ECM customers.
Interested in meeting with ECM executives? We've got you
here to request a meeting with either Doug Hunt, ECM Business Leader, Ken
Bisconti, Vice President ECM Products and Strategy, or Carol Taylor, WW Sales
Leader for Social Content Management.
We also invite you to find us in the exhibit hall at IBM
booth 23 – stop by for a demo of what Social Content Management can do for your
Monday, January 28
11am (Swan Hotel, room 1,2): Genworth Financial, Work Smarter,
Not Harder, presented by Tim Perry, CTO of Genworth Financial
Tuesday, January 29
10am (Swan Hotel, room 9,10): Slumberland Furniture:
Using IBM Software to Deliver Consistently Superior Customer Experiences,
presented by Jamie Page, Director, Slumberland Furniture
11:15am (Swan Hotel, Pelican 1,2): Living Social, Its Not
Just About the Conversations and Topics, a panel discussion of experts,
including Joe Shepley, Doculabs, Larry Hawes, Dow Brook Advisory Services,
Cengiz Satir, IBM, and Steve Studer, IBM
1:30pm (Dolphin Hotel, S. Hemisphere
IV, V): Content & Social Ignites Context: IBM’s Content Platform of
Engagement, presented by Tim Perry, CTO of Genworth Financial, Doug Hunt,
IBM ECM Business Leader, and Ken Bisconti, Vice President of IBM ECM Products
5:30pm (Dolphin Hotel, S.
Hemisphere I): Ignite business performance in real-time with social
collaboration, mobile and content, presented by Ian Story, IBM and Steve
Wednesday, January 30
10am (Swan Hotel, room 4): Reduce, Reuse, and Recycle
Corporate Content, presented by Maig Worel, IBM
1:30pm (Swan Hotel, Mockingbird 1,2): Improving your
Information Economics with Complete Lifecycle Governance, presented by Mark
Thursday, January 31
7am (Swan Hotel, Toucan 1): Archiving and de-duplicating Email,
Files, and Social Content, presented by Cengiz Satir, IBM
Stay Social with us during the show #IBMConnect – @IBM_ECM @csatir
An InformationWeek Live WebCast:
The Myth of Systems of Record vs. Systems of Engagement
Date: Tuesday, December 11, 2012
Time: 9:00 AM PT / 12:00 PM ET
Duration: 60 minutes
Many business people are familiar with Geoffrey Moore's dichotomy of "systems of record" and "systems of engagement." While this construct is a simple, clear way of categorizing software purpose and functionality, it doesn't reflect reality in most organizations. Every day, workers create business records within email, collaboration, and social networking applications. At the same time, they seek to communicate and work with others within the context of work processes supported by their organizations' back-end systems, including those used to manage content.
In this InformationWeek Webcast, Larry Hawes, principal at Dow Brook Advisory Services, will make the case for digital environments that provide the data and content, as well as the communication and collaboration tools, needed to perform specific tasks, while shielding most workers from the complexity associated with the capture and management of legal business records.
VP and editor in chief of InformationWeek
Principal and Founder,
Dow Brook Advisory Services
Enterprise Content Management,
Business content is easily produced by Microsoft based
desktop productivity tools such as MS Word, Excel and Powerpoint. Internally, many organizations share and
collaborate on content using MS SharePoint.
As content moves through collaboration cycles, its value to an
organization increases - both for the collaboration team and for its value to
others across the organization.
I’d like to show you how users can leverage IBM ECM today through
collaboration applications like MS SharePoint and how users can leverage MS
desktop productivity and email tools to directly access ECM services.
Attend a live product demonstration of the latest release of Content Collector
for MS SharePoint (formerly known as FileNet Connector for MS SharePoint) and
MS Office direct integration with IBM FileNet Content Manager.
• Stay in the familiar MS Office
environment, while leveraging IBM FileNet Content Manager’s ECM capabilities to
help provide full lifecycle and compliance management.
• Access IBM ECM services directly
through the MS Office 2007 suite of products, providing access to most relevant
ECM features to help user get the daily work done without spending time
learning yet another new application.
Date: December 4, 2012
Time: 9:00 pacific / 12:00 eastern
Panelist: Maig Worel, Consulting IT Specialist - Social Business - IBM ECM NA
Technical SWAT Team
You can register for this event at: http://bit.ly/QuyTLS
A better mobile content experience is here. See it in
action - live.
Improved access, insight and interaction. From nearly
anywhere. That's what you'll get from the latest
addition to the IBM Enterprise Content Management
solution family: IBM Content Navigator.
Attend a live,
online demonstration of IBM Content Navigator to
get a firsthand look at a richer, more collaborative and
mobile content experience. Content Navigator allows
users to access, manage and work with enterprise content
directly from nearly any mobile device, practically
anytime and from virtually anywhere - even across
multiple systems and enterprise content management
Attend this complimentary demo, and you'll learn how
Content Navigator can help you:
- Collaborate from nearly anywhere, any time on
virtually any digital device
- Add photos from mobile devices to business processes
- Find the exact content you need through rapid,
- Establish project teamspaces that streamline
is complimentary, but you must register
Insurers – do you find it difficult to provide quality,
cost-effective customer service to your policyholders and agents? You're not
alone. Many insurers face the same challenges in today's market. Paper-based
processing environments are not conducive to providing high levels of service,
which your policyholders and agents expect. But how can you overcome these
Join IBM and TriTek Solutions for a one-hour webcast on
November 27, 2012 which will cover key points to maximize your investments in
document capture and retrieval solutions. During this session, we will also
share case studies of several successful implementations at both Life and
P&C insurance organizations.
A live Q&A session will be hosted at the end of the
webcast, so bring your questions for our experts!
There is NO COST to attend, but you must register: http://bit.ly/Quva0R
November 27, 2012
10am PT/1pm ET
Hosted by Insurance & Technology
Ready? Forward march! How is your organization preparing?
The battle against paper isn't over yet, my friends, and now there are new
technologies in the mix that are not only increasing your organization's
content, but dispersing it in entirely new ways. How your organization responds
to the BYOD (Bring Your Own Device) challenges will affect your employee's
productivity and could either strengthen your business or leave you shaking in
Join IBM at four of this fall's AIIM Boot Camp events to
learn how organizations like yours are outlining their battle plans. Network
with your peers and hear first-hand what works and what doesn't and then meet
with trusted vendors to help you reach those goals.
There's NO COST to attend, just register online:
October 2: Minneapolis,
October 4: Chicago,
October 10: Toronto,
October 16: Washington,
IBM will present some helpful ways to strategize, including
customer case stories about planning a social content management strategy –
should it be directed or viral? – and tackling the never-ending surge of paper.
If you did not attend the live or on-demand webcast of the IBM Content Manager (CM8) product update
, here are some answers to questions that were asked by attendees on topics discussed during the live webcast.
A demo of the new user experience - Content Navigator - is available in the webcast.
Question 1: Can the Content Navigator out of the box application be customized?
Answer 1 : Yes - details at: http://goo.gl/cn4PH
Question 2: Is there a forum for discussion about Content Navigator?
Answer 2: Developerworks forum at: http://goo.gl/wm4tP
Question 3: Does Content Navigator provides a framework for customization?
Answer 3: Yes - details at http://goo.gl/jNdmG
Question 4: Does Content Navigator operate with Content Manager 8 z/os? Are there any restrictions?
Answer 4: Yes - though the web application server cannot be operated on z/os. Web app server operation is supported on zLinux, or on one of the other supported distributed platforms. Details at: http://goo.gl/IfdhF
Question 5: Does Content Navigator support single sign-on?
Answer 5: Yes - detailed at http://goo.gl/fxsE4
IBM ECM Innovation Awards program recognizes IBM ECM clients who have
demonstrated excellence in deriving exceptional business value from IBM ECM
software. Winners will be selected from among those organizations who have
implemented ingenious solutions using IBM ECM software and recognized at
the Information On Demand 2012 Conference, October 21 - 25, 2012, Las Vegas, NV. Attendance at the
conference is not mandatory to win an award.
To receive the Innovation Award submission form, email: Amit Kumar, firstname.lastname@example.org
date for submitting your implementation for this award is 15th August 2012
The IBM Enterprise Content Management Customer Innovation Awards have a long
history of recognizing outstanding companies that have implemented innovative
ECM solutions combining business and technical vision with demonstrable results.
Past winners include:
Bluecross BlueShield of
State of North Dakota
Standard Chartered Bank
U.S. Nuclear Regulatory
Tejon Ranch Company
Novartis International AG
Apart from the prestige
associated with the awards, it presents a unique opportunity for our customers
to showcase their innovative use of ECM technology and to:
Be distinguished as a
technology leader in your industry for solving specific business challenges.
Be recognized at
the IBM Information on Demand Global Conference in October with an award
crystal and other recognitions.
This year we are having
four interesting categories to cover entire spectrum of ECM capabilities
and Socialize: Best IBM Social
Content/Capture Award - Recognizes the best use of innovative IBM ECM
Social Content or Capture/Imaging software
2. Activate: Best IBM Case Management Award - Recognizes
the best use of innovative IBM Case Management software solutions.
3. Analyze: Best IBM Content Analytics Award - Recognizes
the best use of innovative IBM Content Analytics software solutions.
4. Govern: Best IBM Information Lifecycle Governance Award -
Recognizes the best use of innovative IBM Information Lifecycle Governance
categories, judges will look for deployed applications that solve challenging
or unique business problems. Extra emphasis will be placed on quantifiable
return on investment or creative deployments that lead specific industries for
Guest post by: Richard Joltes Software Developer, Content Discovery and Management, IBM Enterprise Content Management
In today’s market, I.T. dollars are in short supply and there’s
an increasing requirement for organizations to reduce operating costs. Projects
are scrutinized closely in order to ensure a solid ROI before any significant budgetary
expenditure can be authorized. In this restricted operational model, automated document
classification can easily demonstrate its value simply on the basis of the hardware
and storage savings that it can permit.
We know that unstructured data generally accounts for about
80% of all content in a given organization. It’s also true that organizations
can lose track of data due to mergers, organizational changes, lack of a consistently
applied document management policy, and other factors. Unrealistic email
retention policies, unmanaged file shares, or a general “save everything” mentality
can result in the accumulation of massive archives containing data that is, to
be frank, largely useless. What’s the point of having every file or email ever
sent by each employee if (a) no one is interested in them, (b) few employees
know they exist, and (c) the cost of maintaining the servers outweighs any
possible benefit of retention?
Given a well structured taxonomy, a coherent document
retention policy, and a well trained classifier, organizations suffering from
the type of storage nightmare described above can easily eliminate a
significant percentage of pointlessly archived data, thus realizing a huge ROI
while easing access and availability of truly actionable materials hidden within
their existing repositories.
Evaluating the long term cost savings of such a project
requires a solid analysis of existing archival data and its overall relevance
to current business and regulatory requirement. Once such an analysis has been
performed, and a content classifier has been trained to provide a level of
accuracy appropriate to the data to be classified, the ongoing task generally involves
monitoring activity and making corrections via a feedback mechanism as content
changes over time. Each content item will be evaluated by the classifier and (variously)
re-filed in a centralized repository, left in place, or removed from the
system. Individual organizations can design their own solution and final
document disposition policies based on specific organizational requirements and
solution design requirements.
Think of some of the potential savings in your own
organization. Are you operating older systems solely for the purpose of
maintaining years of unorganized or semi-organized files, with no clear idea
how much of the information on these shares is usable or in use? How much does each
system, potentially running an out of support OS or locally developed content
archive, cost to operate in a given year? What’s the organization’s legal or
regulatory exposure should the system die unexpectedly? How much time do your
employees spend managing such servers? How much space and other resources do such
systems consume in your data center? Even worse, are some or all of these
systems located in unmanaged offices where data can be compromised or lost due
to a lack of security?
Answering these questions will help you understand the
benefits of implementing a centralized, managed content store that can also
assist in filtering out irrelevant, outdated data using automated content
Guest Post by Richard Joltes Software Developer, Content Discovery and Management, IBM Enterprise Content Management
As any I.T. veteran knows, management of unstructured data has
become increasingly difficult over the years. Web pages, PDF files, Office
documents, and email messages can (and do) accumulate within file systems and
other repositories at an alarming rate, consuming storage and other resources. Some
organizations adopted a ‘save everything’ model that has resulted in huge file
shares or email archives that likely contain only a small percentage of usable
data. Finding files or messages in these archives can be nearly impossible,
especially in situations where unmanaged repositories or departmental file
shares are involved. Additionally, this storage model can result in legal
headaches if a lawsuit or other action results in a demand to produce all
documents related to a given case. Searching vast archives of potentially
relevant materials can consume significant resources over a long period of
Automated content classification can help mitigate such
problems, but groundwork and planning, as well as a solid understanding of the
content to be classified and how it can be logically divided into various
categories, are needed in order to insure success. As a good starting point for
this process, consider the following questions.
What’s my taxonomy?
You can’t categorize documents, or anything else for that
matter, without a coherent list of known categories and criteria that
distinguishes one from the others. This list, along with the characteristics of
each element, is known as a taxonomy,
and most people make use of them in everyday life without even knowing it. We
instinctively know the difference between a laptop and desktop computer, and
most people can articulate what those differences are with relative ease.
The same is true when documents are involved. What’s the
difference, for instance, between an “Accounting and Finance” document and one
from “Engineering”? Are there key phrases, terms, and intents that could help
an employee distinguish one from the other with a reasonable level of
confidence? If the answer is yes, then it is likely that software such as IBM
Content Classification™ will be
able to distinguish one from the other once it has been trained to recognize
Certain categories may be more problematic: “Legal” and
“Regulatory” may involve significant overlap of intent and language, for
instance. The rule of thumb is simple. If a human can’t classify documents into
selected categories with a high level of certainty, then a computer won’t be
able to either. It’s a simple as that.
Do I understand my content?
Generally, creating a taxonomy only works if you understand
the content you intend to classify. A review of the content to be classified –
not just document titles, but some amount of actual content, along with associated
metadata, should be conducted as part of the taxonomy creation process.
If multiple content sources with multiple types of documents
and intents are to be classified, then a sample from each must be reviewed in
order to determine how its specific content might affect the outcome of the
classification process. There may also be cases where certain file types, such
as image-format PDFs or encrypted data, can’t be read successfully by
text-oriented classification software. Document language must also be taken
into account, since automated classification software must be trained on a
It’s also necessary to consult appropriate internal authorities,
such as legal advisors and regulatory affairs personnel, in order to determine
how long various document types must be retained. While questions such as these
are more directly related to retention and file policies, they’re also relevant
to automated document classification. Certain document types may contain specific
terms and phrases that the software can be configured to search for, resulting
in higher confidence levels when performing classification tasks.
What’s the goal?
This question must obviously be asked before undertaking any
I.T. related project, since the cost and effort must be justified by a
measurable return on investment. The business case for automated content
classification depends on the industry, current practice, and the desired
outcome. Do you need to consolidate content sources as the result of an
acquisition or merger? Are regulatory needs driving the requirement for
efficient, legally defensible document management practices? Is your email
server laboring under the burden of 10 years worth of potentially useless
Done correctly, an automated classification project can offer
a solid ROI in a fairly short period of time. Lower storage and infrastructure
costs, easier access to relevant data, and less exposure to litigation-related
issues are obvious benefits that can justify the time and expense involved. Tasks
such as taxonomy creation and an initial document review generally should be
performed in advance if at all possible.
Doing so will help ensure success while preserving schedules and keeping
implementation costs to a minimum.
In a recent AIIM survey, Over 70% respondents said that they
find it easier to find information online than content on their company’s
intranet. Many of us at some time or the other have wondered why our intranet
search cannot work like the popular internet search engines.
The answer is simple and complex.
The simple answer is no company has a multi-billion dollar
server farm to enable search on their intranet; making internet and intranet
search comparisons unfair.
The complex answers lie in the fundamental differences
between what and why people search on the intranet and internet. Search experts
differentiate searches as discovery search and retrieval search- in lay man
terms we search to gain knowledge about a subject or to find a specific object.
Most internet search is discovery or knowledge search and most intranet
searches are retrieval or object search.
If I want to know more about a product, I am more likely to
use a discovery method to find information sources pertaining to the subject of
interest. Read the articles, listen to the podcasts and view the videos to gain
the required knowledge. My expectations of finding the information quickly are
very low, I am relatively agnostic to the information source, and I am ready to
invest effort to collect information snippets and then string them together to
build my knowledge base.
But when I am searching for a specific document it is
because this search is part of my larger task and the delays in finding the
specific document will lead to delivery delays, so my expectations for accuracy
are absolute and I want to find the particular document instantly and not
hidden in the 3rd page of the search result.
Apart from the fundamental usage difference there are
definite technical differences emanating from the larger number of data types
that intranet searches need to tackle, the federation of information sources,
the lack of vested interest of authors to manually embed rich metadata with the
Traditional enterprise search engines relied heavily on
metadata to index documents; and accuracy of the search depended on the
performance of the crawler to extract meta-tags from content- file name,
author, date, information source. Of late there has been increased adoption of
content analytics to enable semantic and faceted search which has had a
significant impact on the accuracy of search results. And accuracy of search
improves dramatically with powerful content analytics technology.
Next week I will continue this discussion to talk about how
analytics improves search.
The IBM FileNet P8 V5.1 information center was updated recently. Updates include:
- Instructions for using an encrypted password with the Content
Engine Bulk Import tool.
- Enhanced query syntax documentation for IBM Content Search Services.
- A new topic, Subscribable and Auditable Events, that enumerates
the Content Engine events that you can subscribe to or audit.
- Descriptions for the columns in the Content Engine database table
For more information on this and earlier updates, see http://ibm.co/tdaVE5
Guest post by Steve Studer Offering Manager - IBM ECM Marketing
Products and Strategy
Observing the success of social networking and content
search tools reminds me of one of my favorite books called "Connections"
Burke. What I find they have common
is that a business' success is very much dependent on content being utilized
meaningful ways, e.g. connecting that content to the right people who can then
leverage the information for multiple purposes. Mr. Burke sites that one of the
biggest catalysts that changed the world was the establishment of the Library
in Venice where every ship arriving to do trade
was asked to provide books that were then reproduced by scribes and made
available to anyone who came to the library of Venice. What I find so fascinating is how the
idea's presented in this book was also adapted into a popular television series
on PBS and is now is electronically available. To me the essence of Social
Collaboration is to connect content to people and processes and to provide
context where Idea's can be freely exchanged. This typifies why IBM
Connections Enterprise Content Edition is so important to business today.
By the way, anyone interested in reading the book or watching the TV series can
for "Connections by James Burke" or View the multi- part
documentary on YouTube or buy the EBook
As James Burke often points out, the library concept expands
knowledge transfer at the same time it can be a great catalyst for change.
Connecting content with people can be the greatest incubator for expanding
abstract ideas. One case in point that happened to me just last week was one of
my colleagues found a WIKI link which I had authored and tagged in IBM's
Internal Connections site. What I found
so interesting was how the person discovered me. I had tagged a document inside the wiki with
meta-data tags for content analytics and social collaboration. I turns out this
person is working with several customers and was looking for an expert on the
topic of Social Content and any presentation materials that could serve as
educational tool for the customer.
Reaching the assets was only small part of her solution to her
challenge. The fundamental piece she
needed was connecting with the right subject matter expert who could help
present these concepts. I'm happy to say
I made her day because she was able to intuitively navigate our internal Social
Content community, locating both the content and expert, literally giving life
to the meaning “context to content and people”.
In closing, as James Burke so brilliantly points out, the
transfer of knowledge is the greatest catalysts for change. Consider what
wonders man has been able to expand upon when the transfer of information
happened at a wind and sails pace and the medium was primarily paper in the
form of books or correspondence. Back
then, the sharing of content and knowledge was rarely done face-to-face because
of the time it took to travel long distances, e.g. weeks, months and many
instances even years. It is unfathomable to me the impact that these new social
tools will have when you consider that connecting content with people on
opposite ends of the world can now be linked at the speed of Ethernet, not to
mention the ability to share that information through multiple social and
mobile content mediums. These new
technologies will definitely contribute to the "The Day the Universe Changed"
or at minimum a Smarter