The Chinese may now have personal information on 4 Million US Government employees

Yet another sensational data breach headline – not even shocking anymore.  Yawn.  But listening to the story on the radio on the way home last night after being slaughtered in softball again, I started thinking.  And I dug a little deeper into the story when I got home.  I was shocked.

The systems penetrated belong to the US Government Office of Personnel Management.  Yep, that’s the United States Federal Government Human Resources Department.  It holds personal information for everyone who works for the US Federal government.  It’s the agency that hands out security clearances.  Think about this.  Let it sink in.

The Chinese broke into the system that US Government investigators use to store information about background checks for people who want security clearances.  That’s right.  If you applied to the US Government for a security clearance, it’s a good bet the Chinese know a lot about you now.  Which means you’ll probably be the target of some finely crafted spear phishing campaigns for the next several years.

And that’s only one system out of 47 operated by the Office of Personnel Management (OPM).  It’s not the only one the Chinese penetrated.

Update:  According to this Washington Post article, (PDF here in case the link breaks) the Chinese breached the system managing sensitive information about Federal employees applying for security clearances in March 2014.  The latest OPM breach targeted a different data center housed at the Interior Department.

Update June 12, 2015:  The original reports were bad.  Now it’s even worse.  It now seems the Chinese have detailed information on every US Federal employee.  14 million, not 4 million.  And people may die directly because of this breach. But even now, we don’t know the extent of the damage.  This article from Wired Magazine sums it up nicely.

Reactions from high government officials were typical. They all take the problem seriously.  Bla bla bla.  According to the Wall Street Journal:

“We take very seriously our responsibility to secure the information stored in our systems, and in coordination with our agency partners, our experienced team is constantly identifying opportunities to further protect the data with which we are entrusted,” said Katherine Archuleta, director of the Office of Personnel Management.

Here’s another one, from the New York Times:

“The threat that we face is ever-evolving,” said Josh Earnest, the White House press secretary. “We understand that there is this persistent risk out there. We take this very seriously.”

This one from the same Washington Post article is my favorite:

“Protecting our federal employee data from malicious cyber incidents is of the highest priority at OPM,” Director Katherine Archuleta said in a statement.

Do I really need to ask the question?  Katherine, if it’s such a high priority then why didn’t you address the problem?

As I mentioned in a blog post way back in Feb. 2014, about dealing with disclosures, we’ve heard lots of noise about this breach but very little useful information.  Here’s what we do know.  I want to thank David E. Sanger, lead author of the New York Times article, “U.S. Was Warned of System Open to Cyberattacks,” for sending me a link the 2014 Federal Information Security Management Act Audit report.  In case that link breaks, here is a PDF.

We know the Chinese penetrated the OPM in fall 2014 and stole at least 4 million records over the next six months.  That’s it. As usual, nobody I can find is forthcoming with details.

The report from the Office of Inspector General (OIG) gives us some clues.  Apparently, the various program offices that owned major computer systems each had their own designated security officers (DSO) until FY 2011.  The DSOs were not security professionals and they had other jobs, which means security was a bolted on afterthought.  In FY2012, OPM started centralizing the security function.  But by 2014, only 17 of the agency’s 47 major systems operated under this tighter structure.

All 47 major systems are supposed to undergo a comprehensive assessment every three years that attests that a system’s security controls meet the security requirements of that system.  It’s a rigorous certification process called Authorization.  Here’s what the report said:

“However, of the 21 OPM systems due for Authorization in FY 2014, 11 were not completed on time and are currently operating without a valid Authorization (re-Authorization is required every three years for major information systems). The drastic increase in the number of systems operating without a valid Authorization is alarming, and represents a systemic issue of inadequate planning by OPM programming offices to authorize the information systems that they own.”

Remote access also had problems.  Apparently the VPN vendor OPM uses claims the ability to terminate VPN sessions after an idle timeout.  But the idle timeout doesn’t work and the vendor won’t supply a patch to fix it.

Identity management was also weak.  Although OPM requires multi-factor authentication to enter the network, none of the application systems do.  So if a Chinese bad guy penetrates the network, he apparently has free reign to everything in it once inside.  And since OPM had no inventory of what systems it owned or where they were or their use, OPM had no way to know the Chinese were plundering their data.

It adds up to a gigantic mess.  And an embarrassment, which probably explains why nobody wants to talk about details.

Wonderful.  So what can a small IT contractor from Minnesota offer the multi trillion dollar United States Federal Government to address this problem?  Here are some suggestions from an outsider who wrote a book about data breaches.

Three attributes will keep our systems safe.  Sharing, diligence, and topology.

Sharing drives it all.  So first and foremost – move from a culture of hierarchy, secrecy, and “need to know” to a culture of openness, especially around security.  What does that even mean?  For an answer, check out the new book by Red Hat CEO Jim Whitehurst, “The Open Organization,” published by the Harvard Business Review.

The Chinese, and probably others, penetrate our systems because a government culture of secrecy and “need to know” keeps our teams isolated and inhibits collaboration and incentives for excellence.  It’s a traditional approach to a new problem the defies tradition.  I’ll bet the Chinese collaborate with each other, and probably also with the North Koreans.

Instead of a closed approach, adopt an open approach.  Publish source code, build communities around each of those 47 systems, and share them with the world.  To protect it better, share how it all works with the world.

And when breaches happen, don’t tell us how you take security seriously.  You’re supposed to take security seriously.  It’s your job.  Tell us what happened and what steps you’re taking to fix the problem.  Instead of hiding behind press releases, engage with your community.

And use open source tools for all your security.  All of it.  Firewalls, VPN systems, IDS/IPS (Intrusion detection/Intrusion prevention systems), traffic analyzers, everything.  Breaches occur with open source software, just like proprietary software, but when they happen, the open source community fixes them quickly. Why? Because the developers’ names are on the headers and they care about their reputations.  You won’t need to wait years for a VPN patch in the open source world.

Openness doesn’t mean granting access to everyone.  Openness means building communities around the software systems OPM uses and accepting patches and development from the community.  Community members are compensated with recognition and opportunities for paid engagements.  OPM is rewarded with hardened, peer reviewed software driven by some of the smartest people on the planet.

When teams move away from hierarchy to an open culture, diligence and topology will follow.  There is no substitute for diligence and no technology to provide it. Teach everyone to be diligent and practice it often with drills.  Reward the cleverest phishing scheme or simulated attack and reward the cleverest defense.

And topology – put layers of security in front of key databases.  Put in appropriate access and authorization controls for key databases to ensure personal information stays personal.  Consider physically segregating these database systems from the general network and setting up a whitelist for their interactions with the world.

None of this proposed culture shift needs to cost a fortune.  And in fact, in this era of doing more with less, might save taxpayer money by igniting passion at the grass roots of the OPM IT staff.

Am I proposing radical change to a government that resists change?   Yup.  So why do it?  I’ll answer that question with my own question – given the recent headlines and your own Inspector General audit reports from the past several years, how’s the current method working out?

(I first published this on my Infrasupport website on June 6, 2015.  I backdated here to match the original posting date.)

What is redundancy anyway?

I’ve been in the IT industry my entire adult life, so sometimes I use words and just assume everyone thinks they mean the same thing I think they mean.  I was recently challenged with the word, “redundancy.”

“What does that even mean?” asked my friend.

“It means you have more than one.”

“So what?”

“So if one breaks, you can use the other one.”

“Yeah, everyone knows that, but what does it mean with IT stuff?”

Seems simple enough to me, but as I think about it, maybe it’s not so simple.  And analyzing how things can fail and how to mitigate it is downright complex.

Redundancy is almost everywhere in the IT world.  Almost, because it’s not generally found in user computers or cell phones, which explains why most people don’t think about it and why these systems break so often.  In the back room, nearly all modern servers have at least some redundant components, especially around storage.  IT people are all too familiar with the acronym, RAID, which stands for Redundant Array of Independent Disks.  Depending on the configuration, RAID sets can tolerate one and sometimes two disk failures and still continue operating.  But not always.  I lived through one such failure and documented it in a blog post here.

Some people use RAID as a substitute for good backups.  The reasoning goes like this:  “Since we have redundant hard drives, we’re still covered if a hard drive dies, so we should be OK.”  It’s a shame people don’t think this through.  Forget about the risk of a second disk failure for a minute.  What happens if somebody accidentally deletes or messes up a critical data file?  What happens if a Cryptolocker type virus sweeps through and scrambles everyone’s files?  What happens if the disk controller in front of that RAID set fails?

Redundancy is only one component in keeping the overall system available.  It’s not a universal cure-all. There will never be a substitute for good backups.

Virtual environments have redundancy all over the place.  A virtual machine is software pretending to be hardware, so it’s not married to any particular piece of hardware.  So if the physical host dies, the virtual machine can run on another host.  I have a whole discussion about highly available clusters and virtual environments here.

With the advent of the cloud, doesn’t the whole discussion about server redundancy become obsolete?  Well, yeah, sort of.  But not really.  It just moves somewhere else.  Presumably all good cloud service providers have a well thought out redundancy plan, even including redundant data centers and replicated virtual machines, so no failure or natural disaster can cripple their customers.

With the advent of the cloud, another area where redundancy will become vital is the boundary between the customer premise and the Internet.  I have a short video illustrating the concept here.

I used to build systems I like to call SDP appliances.  SDP – Software Defined Perimeter, meaning with the advent of cloud services, company network perimeters won’t really be perimeters any more.  Instead, they’ll be sets of software directing traffic to/from various cloud services to/from the internal network.

Redundancy takes two forms here.  First is the ability to juggle multiple Internet feeds, so when the primary feed goes offline, the company can route via the backup feed. Think of two on-ramps to the Interstate highway system, so when one ramp has problems, cars can still get on with the other ramp.

The other area is redundant SDP appliances. The freeway metaphor doesn’t work here. Instead, think of a gateway, or a door though which all traffic passes to/from the Internet.  All gateways, including Infrasupport SDP appliances, use hardware, and all hardware will eventually fail.  So the Infrasupport SDP appliances can be configured in pairs, such that a backup system watches the primary. If the primary fails, the backup assumes the primary role. Once back online, the old primary assumes a backup role.

Deciding when to assume the primary role is also complicated.  Too timid and the customer has no connection to the cloud.  Too aggressive and a disastrous condition where both appliances “think” they’re primary can come up.  After months of tinkering, here is how my SDP appliances do it.  The logic is, well, you’ll see…

If the backup appliance cannot see the primary appliance in the private heartbeat network, and cannot see the primary in the  internal network, and cannot see the primary in the external Internet network, but can see the Internet, then and only then assume the primary role.

It took months to test and battle-harden that logic and by now I have several in production.  It works and it’s really cool to watch.  That’s redundancy done right.  If you want to find out more, just contact me right here.

(Originally posted on my Infrasupport website, June 4, 2015.  I backdated here to match the original posting.)

The real-life story of an identity theft victim and what she did about it

I have a friend, let’s call her Mandy.  Mandy is an identity theft victim.  Mandy is not her real name because this is a private story and she wants to maintain her privacy.  She’s willing to share it, anonymously, because she read “Bullseye Breach” and she knows what I do for a living.  She’s hopeful that her story might help others in a similar situation.

For anyone who still thinks the law enforcement bureaucracy will help you when you’ve been violated in this manner, Mandy’s story will change your mind.  And hopefully this deeply personal story will help persuade you that IT security is important and you need to take it seriously.

I am privileged to post Mandy’s story, in her own words.

#####

Living in a nice neighborhood can give you a false sense of security. Maybe you know most of your neighbors and don’t think twice about leaving your windows open all day to let in cool air.  Maybe you don’t even lock your doors at night.

I’ve never been that trusting. I grew up in a South Florida neighborhood where it seemed like we were receiving flyers on a weekly basis about break-ins.

They left an impression on me. Once out on my own, I always made sure my doors and windows were locked, but turns out that didn’t matter.

On the morning of Nov. 7, 2005, someone pried open a locked window and got into my home anyway. My husband and I returned from work around the same time that evening to find our home ransacked.

The thief or thieves must have spent a long time inside because everything, and I do mean everything, that was both portable and valuable was gone. Every room in the house had been gone through.

Missing were thousands of dollars worth of electronics, including a laptop computer that contained personal information and a video camera with precious video of my son inside; all of our checkbooks and bills that had been written out but not yet sent; a set of extra keys to our house and one of the cars; and the coin collection I had been building since I was a kid.

May sound hard to believe, but it wouldn’t have been so bad if that was all that had disappeared. What’s ten times more devastating is the fact that my family also fell victim that day to what has become the number one crime in America — identity theft.

Like so many people I know, we had our social security cards and birth certificates in a fire box under the bed. The thief found the key to the box in my underwear drawer and cleaned it out.

I feel stupid for having left the key in such an obvious place, but my husband has convinced me that if they hadn’t found the key, the thieves would have just taken the whole box anyway. I should have hidden it better.

We spent all of November and December worrying about how our information was going to be used, but nothing bad happened. Then the other shoe dropped the night of January 11th.

Because of the fraud alert we put up on our credit reports after the break-in, someone from Dell Computer called our house around 10 o’clock that night. He said he had J. on the other line and was calling to confirm his identity.

My husband was not the man on the line with Dell. We were being violated again.

After hanging up with Dell, we ran our credit report and found out that a few days earlier, someone had tried to secure a home mortgage in our name.

When I got to work the next morning, I looked up our client contact at one of the credit bureaus, called her up and started asking a lot of questions. She couldn’t answer all of them, so she put me in touch with Kevin Barrows, the former FBI agent who is credited with busting up one of the country’s largest identity theft operations in 2002.

He told me, “Because you put the fraud alert up and filed a police report, you will not be liable for anything the identity thief does; but at the same time, you do need to get his inquiries and the false addresses he gave off your credit report as quickly as possible.”

That night, I embarked on another round of letter writing. The next morning it was off to the post office again.

Early on in the process, I had read an article that recommended all communications with the credit bureaus be sent certified with return-receipt. I’ve spent close to $100 sending letters that way so far.

That’s in addition to the thousands of dollars spent installing an alarm system, fixing our broken window, replacing a damaged sliding glass door; rekeying our house and car; replacing stolen documents; etc. Some, but not all of our losses, were covered by insurance.

Just when we thought we had the situation under control, my husband and I started getting calls from credit card companies calling to confirm our identity because of the fraud alert on our accounts. One after another… I lost count around 30… We would tell the people on the other line that no we did not authorize the opening of an account.

Right away after the calls started coming in, I pulled our credit reports again and found mention of multiple inquiries made by creditors we had never heard of, plus a mysterious address in Illinois added to both mine and my husband’s accounts. I called the police department in that city to report that someone at that address was fraudulently using my address to try and establish credit.

Believe it or not, the detective I spoke with actually told me they had received similar reports from others about that exact address, but there was nothing they could do because it was a federal crime. I was referred to the Post Master General, I presume because the thieves wanted to get credit cards fraudulently sent to them through the mail.

The person I spoke with took down my information and referred me to the FBI. The agent I spoke with at the FBI told me there are too many cases like mine for them to pursue all of them. They referred me back to the local police dept in the jurisdiction where the theft happened. My hometown police department basically said, “Sorry, there is nothing we can do about a crime being committed across state lines.”

I am sharing my story in hopes that I can help make the recovery process easier for someone else.

Here are the steps I’ve taken since the day of the break-in:

  1. Called the police to file a report. (This is a critical step. You will need that report in order to get extended fraud alerts issued).
  2. Called the credit bureaus. (Work your way through the automated menus until you find the option to get a fraud alert issued. Experian, Equifax and TransUnion are required to share information with each other, but to give yourself peace of mind, contact all three anyway. I did.)Equifax: 1-800-525-6285; www.equifax.com; P.O. Box 740241, Atlanta, GA 30374-0241Experian: 1-888-EXPERIAN (397-3742); www.experian.com; P.O. Box 9532, Allen, TX 75013

    TransUnion: 1-800-680-7289; www.transunion.com; Fraud Victim Assistance Division, P.O. Box 6790, Fullerton, CA 92834-6790

  3.  Called the banks to get all of my accounts frozen immediately after discovering the theft. Went into the branches I do business with the morning after the break-in to get new account numbers issued; and also secured a safe deposit box to store personal information in from now on.
  4. Cancelled all of my credit cards. The thieves only made off with the two they found in the fire box, but I have no way of knowing if they went through my files to get other numbers too.
  5. Called all my creditors to see which ones had received payment on my accounts. Sent new checks with a letter of explanation for the lack of a stub to the others.
  6. Had my mail stopped so the thief couldn’t return to the house and steal our mail. Went to the post office daily for over a month until I was able to find, purchase and install a secure mailbox.
  7. Went to the Department of Motor vehicles to get new driver’s licenses issued with new numbers. We have no way of knowing if the thieves came across our old numbers when they went through our file cabinet.
  8. Went to the Social Security office to request new copies of our cards.
  9. Filed a complaint with the Federal Trade Commission (FTC), which shares information about identity theft with law enforcement agencies across the country.You can file a complaint with the FTC using the online complaint form at www.ftc.gov; or call the FTC’s Identity Theft Hotline, toll-free: (877) ID-THEFT (438-4338); or write Identity Theft Clearinghouse, Federal Trade Commission, 600 Pennsylvania Avenue, NW, Washington, DC 20580.
  10. Sent letters to the Department of Vital Statistics in the three states in which our family members were born to get new certified birth certificates. Also had to get a new copy of our marriage certificate.
  11. Once things settled down, called a few alarm companies, took bids, then hired one to install a home burglar alarm for us.
  12. After receiving confirmation of the initial fraud alerts from the three credit bureaus in the mail, sent in letters requesting a 7-year extended alert along with a copy of my police report.
  13. Signed up for 3-in-1 credit monitoring so I’ll know instantly the next time someone fraudulently applies for credit in our name.

#####

If anyone reading this wants to contact Mandy, just contact me and I’ll work on setting it up.

(First published on my Infrasupport website.  I backdated here to match the original posting date.)

How can organizations avoid sensational data breach headlines?

I was in a Barnes and Noble bookstore a few days go, pitching my new book, “Bullseye Breach,” to one of the folks working behind the counter.  I know all the big decisions are always made at corporate headquarters, but nobody invited me to corporate headquarters and I have to start somewhere.  So I started at this store.

While pitching for all I was worth, a lady who said she works at the Target Corporation Credit Department here in the Twin Cities walked up to the counter.  Many have suggested I patterned my fiction story in “Bullseye Breach” after the real world Target breach – I’ll leave that for readers to judge.  I had a copy of my book with me and she seemed interested.  Which helped my ego tremendously.  Those million book sales start with the first one.

We talked for a while and she said, “It’s a shame we’re all so vulnerable.  No matter how big you are, no matter how much you’re loved in the community, no matter how much good you do, a group of crooks can still break in over the Internet and do this to you.”

That triggered a diatribe from me about believing press releases and people who should have known better not doing their jobs.  I said lots of other things, most of it politically incorrect.  To my surprise, she thanked me for being passionate about this topic and even insisted on buying the copy of my book I had with me on the spot.  I walked away dumbfounded and grateful.

That encounter put a whole series of thoughts in motion.  Since I insisted that organizations can protect themselves, that being a victim to cybercrime is not inevitable, what would I do if somebody actually invited me to corporate headquarters to provide advice and counsel to the CIO?

So here is the advice I would offer.

First is topology.  Retailers, isolate your Point of Sale systems from the rest of your network and keep a whitelist for where they can interact.

Set up automation to notify the right people if those POS systems try to interact with anything outside that whitelist.  Other industries may have similar issues, but retail POS systems are special because untrained store clerks interact with them and they interact with payment processors across the Internet.  Their interactions with the internal network and the rest of the world need to be strictly regulated and monitored.  If the topology had been right, and the right people heeded the warnings, none of the sensational data breach headlines we’ve read about recently would have happened.

That leads to diligence.  No matter what technology is in place, there is no substitute for human diligence.  People are and always will be the last and best line of defense against attack.  Train end users to stay away from the wrong websites and not to fall prey to phishing schemes.  Run drills.  Do probes.  Test often and discuss results.

But even with the best diligence and awareness training and drills, a company with 1000 employees means 1000 potential attack vectors.  Inbound spam filtering and outbound web filtering can help, but sooner or later, somebody will visit the wrong website or click on the wrong email attachment.  That’s why the right people need to pay attention to the inevitable warning signs and take action when warranted.

Which leads to sharing.  This is counter-intuitive, but the best way to defend against attack is to share how all the defenses work.  In detail.

This comment to a Brian Krebs blog post deconstructing the 2014 Sally Beauty breach is a great example.  It was a gutsy call for Blake Curlovic to publicly share the detailed information about this breach, both in the Krebs article and in subsequent comments, and the information he shared will be invaluable to future IT Departments fighting bad guys.

In cryptography, the algorithms are public.  Everyone knows them.  That’s why we have strong cryptography today – the surviving algorithms have all been peer and public reviewed, attacked, and strengthened.  CIOs should operate similarly.  Openly discuss security measures, expose them to public and peer review, conduct public post mortem incident reviews, publish the results, and adjust the methods where necessary.

Bad guys are already reviewing, discussing, and probing security in the shadows.  Bad guys have a whole supply chain dedicated to improving their ability to plunder, complete with discussion forums and specialists in all sorts of dark endeavors.  The bad guys have unlimited time and creativity and the good guys are out gunned and out manned.

Against such an adversary, what CIO in their right mind would want to stand alone?

This doesn’t mean CIOs should call press conferences to brag about the latest security tool.  But CIOs should be visible at conferences and should contribute keynotes and other presentations in a running dialog to help continuously improve the state of the art.  They should also be engaged in online forums discussing and refining the latest ideas.  And when it makes sense to appear in front of the written and TV press, they should take the lead and use the forum to educate the public.

Smart good guys should join forces out in the open for the common good.  Contribute to and profit from a thriving marketplace of good ideas and everyone wins.

(Originally published on my Infrasupport website, May 13, 2015.  I backdated here to match the original posting.)

Should government have the the power to access encrypted communications?

The short answer is, no.

The pro argument says law enforcement needs this tool to fight crime and terrorism, and we can build appropriate safeguards into any law to prevent abuse.  The con arguments point out the danger in granting more power to the government, suggesting that safeguards have limited value.

I’ve read through the pros and cons and concluded it’s a bad idea to grant the government power to access encrypted communications.   Nobody wants to give terrorists and other bad guys a free ride – but as many have pointed out elsewhere, bad guys will find their own ways to do encryption regardless of any US law.  So if we pass a law essentially crippling encryption technology in the United States, we hurt the good guys and help the bad guys.  Tell me how this makes any sense.  We’re all better off with a level playing field.

With a law granting the government this power, even loaded with safeguards, what’s to stop corrupt individuals from abusing it? Attempted abuses of power are already easy to find. There was a case in Minnesota a few years ago when male law enforcement professionals looked up driver’s license records for a few female troopers, politicians, and news media celebrities.  In another case, the IRS as an institution put up roadblocks to make it unnecessarily difficult for some nonprofit groups to gain tax exempt status because individuals in positions of authority apparently disapproved of these groups.  So if we grant the government even more power, imagine the possibilities for abuse and tyranny on a massive scale. It would be 1984 in the 21st century.

Some have advocated an approach combining new technologies with court approval as a safeguard against such tyranny.  The ideas essentially come down to inventing an electronic lock-box to hold everyone’s decryption keys.  Law enforcement can access the lock-box only with appropriate court orders.  The idea sounds nice, but it’s short-sighted and foolish.  Does anyone seriously believe a determined group of bad guys would have any trouble coming up with an attack against such a lock box?  Does anyone seriously want to trust our cryptographic keys with the same government that brought us healthcare.gov and sensational headlines around NSA break-ins?

But my opinion is not worth the disk space to store it. Don’t believe me? Just look at what happened to US cloud providers shortly after the Snowden revelations. Look at what happened to RSA’s credibility after the stories about RSA and the government being in cahoots started circulating.  Now imagine what would happen to confidence in the entire United States data grid if such a law were to pass.

Why would anyone trust any service provider with anything important if the government can access all of it? My private information is mine, and I choose who sees it. Not the government. And I promise you, if I have information I care enough about to keep private, I’ll find a way to safeguard it regardless of any law.

Carrie Cordero and Marc Zwillinger recently wrote a point/counterpoint article on this topic in the Wall Street Journal, here.  In case that link breaks in the future, I saved a PDF here.

There are other ways to fight back against the bad guys besides granting tyrannical power to the government.  I wrote an education book about IT security, disguised as an international fiction thriller titled, “Bullseye Breach.” Take a look at the website, right here.

(Originally published on my Infrasupport website, April 21, 2015.  I backdated here to match the original posting date.)

Business Continuity and Disaster Recovery; My Apollo 13 week

[Originally posted December 26, 2014 on my Infrasupport website, right here.  I copied it here on June 21, 2017 and backdated to match the original post.]

I just finished my own disaster recovery operation.  There are still a few loose ends but the major work is done.  I’m fully operational.

Saturday, Dec. 20, 2014 was a bad day.  It could have been worse.  Much worse.  It could have shut my company down forever.  But it didn’t – I recovered, but it wasn’t easy and I made some mistakes.  Here’s a summary of what I did right preparing for it, what I did wrong, how I recovered, and lessons learned.  Hopefully others can learn from my experience.

My business critical systems include a file/print server, an email server, and now my web server.  I also operate an OpenVPN server that lets me connect back here when I’m on the road, but I don’t need it up and running every day.

My file/print server has everything inside it.  My Quickbooks data, copies of every proposal I’ve done, how-to documentation, my “Bullseye Breach” book I’ve been working on for the past year, marketing stuff, copies of customer firewall scripts, thousands of pictures and videos, and the list goes on.  My email server is my conduit to the world.  By now, there are more than 20,000 email messages tucked away in various folders and hundreds of customer notes.  When customers call with questions and I look like a genius with an immediate answer, those notes are my secret sauce.  Without those two servers, I’m not able to operate.  There’s too much information about too much technology with too many customers to keep it all in my head.

And then my web server.  Infrasupport has had different web sites over the years, but none were worth much and I never put significant effort into any of them.  I finally got serious in early 2013 when I committed to Red Hat [edit June 21, 2017 – Infrasuport went dormant when I accepted a full time job offer from Red Hat.] I would have a decent website up and running within 3 weeks.  I wasn’t sure how I would get that done, and it took me more like 2 months to learn enough about WordPress to put something together, but I finally got a nice website up and running.  And I’ve gradually added content, including this blog post right now.  The idea was – and still is – the website would become a repository of how-to information and business experience potential customers could use as a tool.  It builds credibility and hopefully a few will call and use me for some projects.  I’ve sent links to “How to spot a phishy email” [link updated to the copy here on my dgregscott.com website] and other articles to dozens of potential customers by now.

Somewhere over the past 22 months, my website also became a business critical system.  But I didn’t realize it until after my disaster.  That cost me significant sleep.  But I’m getting ahead of myself.

All those virtual machines live inside a RHEV (Red Hat Enterprise Virtualization) environment.  One physical system holds all the storage for all the virtual machines, and then a couple of other low cost host systems provide CPU power.  This is not ideal.  The proper way to do this is put the storage in a SAN or something with redundancy.  But, like all customers, I operate on a limited budget, so I took the risk of putting all my eggs in this basket.  I made the choice and given the cost constraints I need to live with, I would make the same choice again.

I have a large removable hard drive inside a PC next to this environment and I use Windows Server Backup every night to back up my servers to directories on this hard drive.  And I have a script that rotates the saveset names and keeps 5 backup copies of each server.

Ideally, I should also find a way to keep backups offsite in case my house burns down.  I’m still working on that.  Budget constraints again.  For now – hopefully I’ll be home if a fire breaks out and I can grab that PC with all my backups and bring it outside.  And hopefully, no Minnesota tornado or other natural disaster will destroy my house.  I chose to live with that risk, but I’m open to revisiting that choice if an opportunity presents itself.

What Happened?

The picture above summarizes it.  **But see the update added later at the end of this post.**  I walked downstairs around 5:30 PM on Saturday and was shocked to find not one, but two of my 750 GB drives showing amber lights, meaning something was wrong with them.  But I could still survive the failures. The RAID 5 array in the upper shelf with my virtual machine data had a hot spare, so it should have been able to stand up to two failing drives.  At this point only one upper shelf drive was offline, so it should have been rebuilding itself onto the hot spare.  The 750 GB drives in the bottom shelf with the system boot drive were mirrored, so that array should (and did) survive one drive failure.

I needed to do some hot-swapping before anything else went wrong.  I have some spare 750 GB drives, so I hot-swapped the failed drive in the upper shelf.  My plan was to let that RAID set rebuild, then swap the lower drive for the mirror set to rebuild.  And I would run diagnostics on the replaced drives to see what was wrong with them.

I bought the two 2 TB drives in slots 3 and 4 of the lower shelf a few months ago and set them up as a mirror set, but they were not in production yet.

Another note.  This turns out to be significant.  It seems my HP 750 GB hotswap drives have a firmware issue.  Firmware level HPG1 has a known issue where the drives declare themselves offline when they’re really fine.  The cure is to update the firmware to the latest version, HPG6.  I stumbled onto that problem a couple months ago when I brought in some additional 750 GB drives and they kept declaring themselves offline.  I updated all my additional drives, but did not update the drives already in place in the upper shelf – they had been running for 4+ years without a problem.  Don’t fix it if it ain’t broke.  This decision would bite me in a few minutes.

After swapping the drive, I hopped in the car to pick up some takeout food for the family.  I wouldn’t see my own dinner until around midnight.

I came back about 1/2 hour later and was shocked to find the drive in the upper shelf, slot 4 also showing an amber light.  And my storage server was hung.  So were all the virtual machines that depended on it.  Poof – just like that, I was offline.  Everything was dead.

In my fictional “Bullseye Breach” book, one of the characters gets physically sick when he realizes the consequences of a server issue in his company.  That’s how I felt.  My stomach churned, my hands started shaking and I felt dizzy.  Everything was dead.  No choice but to power cycle the system.  After cycling the power, that main system status light glowed red, meaning some kind of overall systemic failure.

That’s how fast an entire IT operation can change from smoothly running to a major mess.  And that’s why good IT people are freaks about redundancy – because nobody likes to experience what I went through Saturday night.

Faced with another ugly choice, I pulled the power plug on that server and cold booted it.  That cleared the red light and it booted.  The drive in upper shelf slot 4 declared itself good again – its problem was that old HPG1 firmware.  So now I had a bootable storage server, but the storage I cared about with all my virtual machine images was a worthless pile of scrambled electronic bits.

I tried every trick in the book to recover that upper shelf array.  Nothing worked, and deep down inside, I already knew it was toast.  Two drives failed.  The controller that was supposed to handle it also failed. **This sentence turns out to be wrong.  See the update added later at the end.**   And one of the two drives in the bottom mirror set was also dead.

Time to face facts.

Recovery

I hot swapped a replacement drive for the failed drive in the bottom shelf.  The failed drive already had the new firmware, so I ran a bunch of diagnostics against it on a different system.  The diagnostics suggested this drive really was bad.  Diagnostics also suggested the original drive in upper slot 2 was bad.   That explained the drive failures.  Why the controller forced me to pull the power plug after the multiple failures is anyone’s guess.

I put my  2 TB mirror set into production and built a brand new virtualization environment on it.  The backups for my file/print and email server virtual machines were good and I had both of those up and running by Sunday afternoon.

The website…

Well, that was a different story.  I never backed it up.  Not once.  Never bothered with it.  What a dork!

I had to rebuild the website from scratch.  To make matters worse, the WordPress theme I used is no longer easily available and no longer supported.  And it had some custom CSS commands to give it the exact look I wanted.  And it was all gone.

Fortunately for me, Brewster Kahle’s mom apparently recorded every news program she could get in front of from sometime in the 1970s until her death.  That inspired Brewster Kahle to build a website named web.archive.org.  I’ve never met Brewster, but I am deeply indebted to him.  His archive had copies of nearly all my web pages and pointers to some supporting pictures and videos.

Is my  website a critical business system?  After my Saturday disaster, an email came in Monday morning from a friend at Red Hat, with subject, “website down.”  If my friend at Red Hat was looking for it, so were others.  So, yeah, it’s critical.

I spent the next 3 days and nights rebuilding and by Christmas eve, Dec. 24, most of the content was back online.   Google’s caches and my memory helped rebuild the rest and by 6 AM Christmas morning, the website was fully functional again.  As of this writing, I am missing only one piece of content.  It was a screen shot supporting a blog post I wrote about the mess at healthcare.gov back in Oct. 2013.   That’s it.  That’s the only missing content.  One screen shot from an old, forgotten blog post.  And the new website has updated plugins for SEO and other functions, so it’s better than the old website.

My headache will no doubt go away soon and my hands don’t shake anymore.  I slept most of last night.  It felt good.

Lessons Learned

Backups are important.  Duh!  I don’t have automated website backups yet, but the pieces are in place and I’ll whip up some scripts soon.  In the meantime, I’ll backup the database and content by hand as soon as I post this blog entry.   And every time I change anything.  I never want to experience the last few days again.  And I don’t want to even think about what shape I would be in without good backups of my file/print and email servers.

Busy business owners should  periodically inventory their systems and update what’s critical and what can wait a few days when disaster strikes.  I messed up here.  I should have realized how important my website has become these past several months, especially since I’m using it to help promote my new book.  Fortunately for me, it’s a simple website and I was able to rebuild it by hand.  If it had been more complex, well, it scares me to think about it.

Finally – disasters come in many shapes.  They don’t have to be fires or tornadoes or terrorist attacks.  This disaster would have been a routine hardware failure in other circumstances and will never make even the back pages of any newspaper.

If this post is helpful and you want to discuss planning for your own business continuity, please contact me in the form on the sidebar and I’ll be glad to talk to you.  I lived through a disaster.  You can too, especially if you plan ahead.

Update from early January, 2015 – I now have a script to automatically backup my website.  I tested a restore from bare virtual metal and it worked – I ended up with an identical website copy.  And I documented the details.

Update several weeks later. After examining the RAID set in that upper shelf in more detail, I found out it was not RAID 5 with a hot spare as I originally thought.  Instead, it was RAID 10, or mirroring with striping.  RAID 10 sets perform better than RAID 5 and can stand up to some cases of multiple drive failures, but if the wrong two drives fail, the whole array is dead.  That’s what happened in this case.  With poor quality 750 GB drives, this setup was an ugly scenario waiting to happen.

We take your privacy seriously. Really?

By now, we’ve all read and digested the news about the December 2013 Target breach.  In the largest breach in history at that time and the first of many sensational headlines to come, somebody stole 40 million credit card numbers from Target POS (point of sale) systems.  We’ll probably never know details, but it doesn’t take a rocket scientist to connect the dots.   Russian criminals stole credentials from an HVAC contractor in Pennsylvania and used those to snoop around the Target IT network.   Why Target failed to isolate a vendor payment system and POS terminals from the rest of its internal network is one of many questions that may never be adequately answered in public.  The criminals eventually planted a memory scraping program onto thousands of Target POS systems and waited in Russia for 40 million credit card numbers to flow in.  And credit card numbers would still be flowing if the banks, liable for fraudulent charges, hadn’t caught on.  Who says crime doesn’t pay?

It gets worse – here are just a few recent breach headlines:

  • Jimmy John’s Pizza
  • Dairy Queen
  • Goodwill Industries
  • KMart
  • Sally Beauty
  • Neiman Marcus
  • UPS
  • Michaels
  • Albertsons
  • SuperValu
  • P.F. Chang’s
  • Home Depot

And that’s just the tip of the iceberg.  According to the New York Times:

The Secret Service estimated this summer that 1,000 American merchants were affected by this kind of attack, and that many of them may not even know that they were breached.

Every one of these retail breaches has a unique story.  But one thing they all have in common; somebody was asleep at the switch.

In a few cases, the POS systems apparently had back doors allowing the manufacturer remote access for support functions.  Think about this for a minute.  If a manufacturer can remotely access a POS system at a customer site, that POS system must somehow be exposed directly to the Internet or a telephone line.  Which means anyone, anywhere in the world, can also remotely access it.

Given the state of IT knowledge among small retailers, the only way that can happen is if the manufacturer or somebody who should know better helps set it up.  These so-called “experts” argue that the back doors are obscure and nobody will find them.  Ask the folks at Jimmy John’s and Dairy Queen how well that reasoning worked out.  Security by obscurity was discredited a long time ago, and trying it now is like playing Russian Roulette.

And that triggers a question.  How does anyone in their right mind expose a POS system directly to the Internet?  I want to grab these people by the shoulders and shake as hard as I can and yell, “WAKE UP!!”

The Home Depot story may be the worst.  Talk about the fox guarding the chicken coop!  According to several articles, including this one from the New York Times, the very engineer Home Depot hired to oversee security systems at Home Depot stores was himself a criminal after sabotaging the servers at his former employer.  You can’t make this stuff up.  Quoting from the article:

In 2012, Home Depot hired Ricky Joe Mitchell, a security engineer, who was swiftly promoted under Jeff Mitchell, a senior director of information technology security, to a job in which he oversaw security systems at Home Depot’s stores. (The men are not related.)

But Ricky Joe Mitchell did not last long at Home Depot. Before joining the company, he was fired by EnerVest Operating, an oil and gas company, and, before he left, he disabled EnerVest’s computers for a month. He was sentenced to four years in federal prison in April.

Somebody spent roughly 6 months inside the Home Depot network and stole 56 million credit card numbers before the banks and law enforcement told Home Depot about it.  And that sums up the sorry state of security today in our corporate IT departments.

I’m picking on retailers only because they’ve generated most of the recent sensational headlines.  But given recent breaches at JP Morgan, the US Postal Service, the US Weather Service, and others, I struggle to find a strong enough word.  FUBAR maybe?  But nothing is beyond repair.

Why is security in such a lousy state?  Home Depot may provide the best answer.  Quoting from the same New York Times article:

Several former Home Depot employees said they were not surprised the company had been hacked. They said that over the years, when they sought new software and training, managers came back with the same response: “We sell hammers.”

Great.  Just great.  What do we do about it?

My answer – go to the top.  It’s up to us IT folks to convince CEOs and boards of directors that IT is an asset, not an expense.  All that data, and all the people and machines that process all that data, are important assets.  Company leaders need to care about its confidentiality, integrity, and availability.

That probably means spending money for education and training.  And equipment.  And professional services for a top to bottom review.  Where’s the ROI?  Just ask some of the companies on the list of shame above about the consequences of ignoring security.  The cost to Target for remediation, lost income, and shareholder lawsuits will be $billions.  The CEO and CIO lost their jobs, and shareholders mounted a challenge to replace many board members.

Granted, IT people speak a different language than you.  Guilty as charged.  But so does your mechanic – does that mean you neglect your car?

One final plug.  I wrote a book on this topic.  It’s a fiction story ripped from real headlines, titled “Bullseye Breach.”  You can find more details about it here.

“Bulls Eye Breach” is the real deal.  Published with Beaver’s Pond Press, it has an interesting story with realistic characters and a great plot.  Readers will stay engaged and come away more aware of security issues.  Use the book as a teaching tool.  Buy a copy for everyone in your company and use it as a basis for group discussions.

(First published Dec. 10, 2014 on my Infrasupport website.  I backdated here to match the original posting.)

Mostly bad telemarketing

I’ve taken thousands of telemarketing cold calls over the years.  Many are deceptive, most are awful.  And a few are good.

Here is a common deception.  Many calls originate in call centers in India or the Philippines.  Callers use IP phones and connect via the Internet to a system in the US, which generates a caller-ID in my local area code.   When my phone rings, I see a caller ID that appears local, which makes me want to answer the phone because I think it might be a potential customer who wants to buy my goods and services.

The conversations generally start something like this:

Hello, this is Greg Scott.  (I always answer my phone this way.)

(long pause, sometimes with a series of clicks)

Yes, hello, I am trying to contact Agreg a Scote, uhm, with Infrasupport-a-tech company?

Well, yes, this is Greg Scott

Ah, good morning, sir Greg.  I am calling  because… (and we’re into the script flowchart).

Why do I dread these calls?  After all, the caller is courteous.  And the company she represents is only trying to find customers.  What’s not to like?

Well, plenty.  First, the call is an interruption.  I have to stop what I’m doing, switch gears, and make a decision whether to answer the phone.  After I answer the phone, I have to focus on the caller’s message instead of what I was working on before the phone rang.   I understand callers are trying to find customers and this is part of business.  I’ve done cold calls myself.  But since callers know they interrupted me, they should respect me and my time.

That leads to the next problem.  From the very first ring, overseas callers with automated dialers and IP phones have already disrespected me and my time.  Why is this company trying to fool me into believing it’s a local call?  How am I supposed to trust a company that tries to deceive me with the very first contact?  Why would I ever consider buying anything from such a company?

Focusing on the caller’s message is also challenging.  I speak English as a native language, my hearing is not what it used to be, and I have a terrible time understanding the thick accent on the other end of the phone.   And I am willing to bet, nobody in Bangalore, India is named Gary.  Or Bob, John, Ted, Mary, or any other common English name.

Here are two questions I want to ask these telemarketing firms – not the callers trying to do their jobs, but the boneheads who manage the callers.  If I tried to speak your native language and you heard my American accent, how much time would you need to figure out your language is not my first language?  If I adopt a telephone name native to your language, would it make a difference?   The obvious answers to those questions are about one second and no.  So why do you think I will believe your caller speaks English as a native language simply because you gave him an American telephone name?

The problem is compounded by poor sound quality.  After the packets containing the sound from these calls bounce around dozens of IP routers before flying across the public telephone network to my phone, the sound is often garbled, muffled, and distorted.  Combined with a thick accent, it is always difficult and often impossible to figure out what callers are saying.

For companies using these services – if you have such a low regard for me as a potential customer, what kind of service can I expect if I buy your product or service?

And it gets worse.

Lately, I’ve taken dozens of calls from machines pretending to be people.  Here is a typical call, synthesized from many:

The phone rings, showing a local caller ID.

Hello, this is Greg Scott

(Long pause – this is always the dead-giveaway.)

Why hello!  This is Nancy and I have an exciting offer for you!

Really – wow, thanks Nancy.  Are you a real person?

(Pause)

(Laughing)  Well of course I’m real, why do you ask?

Well, Nancy, you sounded like a machine.

Oh no, I can assure you, I’m a real person.  I’d like to talk to you about a great line of credit we offer.  If you’re interested, I’ll connect you to my manager and he can cover details with you.

Ah – thanks Nancy.  By the way, who won the baseball World Series last year?

I’m sorry, could you repeat that?

Yes – who won the World Series last year?

I’m sorry, but we’re not allowed to give out personal information.

What’s personal about that?

Thank you.  Goodbye.

Let’s see, what was wrong with this call?  After all, it solved the sound quality problem and the caller’s native language matches mine.  I can visualize a team of misguided engineers,  proud of their creative masterpiece, presenting the slideshow bullet items to a bunch of boneheaded executives in a boardroom and congratulating themselves on solving their telemarketing problem.

Here is my question for the clowns who dream up this stuff.  If your time and money is too valuable to use a real person fluent with my language to make real phone calls, why do you think my time and money is any less valuable?  Do you really think I will buy anything from you when you use a machine to waste my time and lie to me?

So after griping about bad calls, what about somebody who did it right?  Well, it happened one day last year when a nice lady from a training company called me.  Let’s call her Dee and her company, Training Inc.  These are both fictional names.  Dee did everything right.  It was obvious she looked at my website before she made contact because she tailored her pitch to meet my unique circumstances.  She asked me a bunch of questions about how I run my business.  She asked me about my training goals.   She was personable.  She spoke the same native language as me.  Instead of trying to fool me into thinking she was from this area, the caller-ID was from a different state.

I liked Dee.  We connected.  I don’t have any business for her right now, but when the time comes and I am able to send business her way, I will do so.  In fact, I liked Dee so much, I spent most of a Saturday updating and fixing my broken Exchange Server indexing so I could find her contact information.

If you are a telemarketer and happen to read this blog entry, first, thanks for reading.  If you’re spending money for overseas call centers with cheap IP phones, bad connections, and fake caller IDs, or if you’re trying to use machines pretending to be people, save your money.  Nobody in their right mind will buy anything from you when you approach them this way.  Instead, find somebody like Dee who will represent you properly.

Even in today’s high-tech, 24 hour, over stressed environment, the old-fashioned rules still apply.

(First published on my Infrasupport website, June 7, 2014.  I backdated here to match the original posting date.)

How to do bad customer service and destroy your reputation

I survived one of the worst customer service experiences ever this week.  We can all draw some lessons from this story.

The end user customer operates branch sites across the Midwest USA and uses Infrasupport firewalls to connect to the Internet and main office in the St. Paul, MN. area.  This branch site is in southern Illinois and uses a nifty new twist on my network failover system.

The firewall has a wireless LAN (WiLAN) card and several wired network interface cards (NICs). The new twist – connect a wired NIC to a low cost DSL connection. When the DSL modem drops, failover to the WiLAN connection and route through a cell phone carrier. And when the DSL connection comes back, fail back to the DSL wired connection. It works – this is the lowest cost redundancy anyone can buy.

This site opened about 6 weeks ago and the DSL connection wasn’t ready.  No problem.  We routed via WiLAN over the cell phone carrier and waited more than a month for Ma Bell Internet to provision the DSL service.

Ma Bell Internet is a fictitious name. As are the names of everyone else in this story other than me. Other than the names, every detail is true. As one person involved said, “you can’t make this stuff up.”

The story starts last Wednesday after I did the appropriate adjustments on the firewall to accommodate the new DSL connection.  Tom, the end user customer at the site, connected the patch cable from the DSL modem to the firewall and nothing worked.  Our troubleshooting pointed to a misconfigured or bad DSL modem.

We logged a service call with Ma Bell Internet, which triggered a week long comedy of errors. We called on Thursday and talked to – I’ll call her Misty – from tech support.   Here is a piece of our conversation:

Greg: You have a gateway at this IP Address. We can’t ping it from the site.

Misty: I don’t see why you can’t ping it. I can ping it from here.

Greg: Right – I don’t know why we can’t ping it. That’s why we’re calling. Nobody can route through that DSL modem and we can’t ping it.

Misty: I don’t understand why you can’t ping that modem. I can ping it from here, you should be able to ping it from there.

Greg: Right – we should.  And if it answered it would be even better.

Misty: So how come you can’t ping it?

Greg: Routers have two sides. We’ll call them the inside and outside. You’re on the outside and you can ping it. The site is on the inside and can’t ping it. Seems to me, something is wrong with the inside of that router.

Misty: Well if you can’t ping it, there’s not much I can do.

Greg: Yes there is!  It’s your router.  You need to fix it.

Misty: I tried connecting to it from here remotely to check its configuration but I’m not able to.

Greg: So – doesn’t that suggest to you something is wrong?

Misty: No. We connect those the way we’re ordered to connect them. If the order said no remote diagnostics then we wouldn’t have turned that on.  And I can ping it.

Greg:  So did the order for this one say no remote diagnostics?

Misty:  I don’t know what the order said – I don’t have it.

Greg:  So how do you know the order didn’t call for remote diagnostics?

Misty:  Because I’m not able to connect to it.

Greg: (Exercising patience)  OK, so how do we fix this?

Misty: You can run remote diagnostics from your site. Just hook a real computer up to it and connect to its website.

Greg: The problem with that is, we can’t access the modem. If we could access the modem we wouldn’t need to call you. And the only computer I have onsite is my firewall. Everything else is thin clients.

Misty: If you’re unwilling to run remote diagnostics, I can’t do much to help you.

Greg: Yes you can. Send somebody out there to fix the modem!

Misty: Maybe you can borrow a laptop from somebody and try the diagnostics.

Greg: There are no laptops to borrow at this site. We have my firewall – with no graphics so connecting to a GUI website won’t work anyway – and some thin clients. That’s it.

Misty: I can dispatch a technician, but it will be billable.

Greg: What??? Why is it billable?

Misty: Because you’re unwilling to run remote diagnostics.

Greg: No, not unwilling.  Unable.  We are unable to connect to this modem.  If we can’t connect to it, how do we run diagnostics on it?

Misty: If I dispatch a technician and he finds a building wiring problem or something else not from us, we’ll have to charge you.

Greg: Well of course. The wiring is a 20 foot cat5e patch cable.  That’s it.  That’s the building wiring.  One patch cable.  Send somebody out.

That was on Thursday.  I asked for somebody onsite Friday.  But with nobody available Friday, I had to settle for Monday.  My phone rang on Friday and I confirmed it – send somebody Monday, preferably Monday morning.

Monday morning came and I learned Ma Bell sent somebody to the site on Saturday.  Of course nobody was at the site on Saturday.  After some Monday phone calls, we scheduled another visit for Wednesday – nobody from Ma Bell was available Tuesday.  So now were were a week into this issue.  After opening the support ticket on Thursday, the soonest possible resolution would not happen until next Wednesday.

Wednesday morning about 9:45 my phone rang.  It was the onsite Ma Bell support technician.  Let’s call him BA.  You’ll see why in a minute.

BA told me the customer became angry and sent him upstairs to call me.  I apologized and said there was a systemic problem at Ma Bell Internet and he was the guy onsite who had to hear it all.   BA told me he also got mad at the customer and apparently started throwing boxes around the site.  And that was when emails started pouring into my inbox warning me about this technician and his attitude. Apparently, BA was angry the second he walked in the door and took out his frustrations on Tom, the end user customer.  Tom got tired of the abuse and sent BA upstairs to call me.

BA asked me a bunch of questions I didn’t know how to answer.  He wanted to know how to configure the router password and who to call for onsite tech support.  I asked BA – since he worked for Ma Bell Internet – if he shouldn’t know who to call for support?  BA launched into a diatribe about his employer and his frustrations about training and scheduling and his management.  Here is the portion I remember most vividly:

Greg: So how do you know these connections are good?

BA: We set them up dynamic and then connect them to the Internet. If they work, then we give them their static address and we’re done.

Greg: So how do you know they work when they’re static?

BA: Because they worked when they were dynamic.

Greg: Don’t you think you should test them when they’re static?

BA: That’s not what we do.

Greg: Maybe you should give the Ma Ball guys some feedback about…

BA: (Interrupting) They won’t listen.

Greg: How badly do they want to keep customers?

BA: They won’t listen. They’re managers and they don’t care!

Greg: OK. Well for now, we need to fix this modem. I need it to to have the static IP Address assigned to it, no NAT, no DHCP, and no firewall rules…

BA: (Interrupting again) Hold on there – Maybe those were English words but I have no idea what any of that stuff means. I’m not an IT guy.  I don’t know why customers keep trying to get me to solve their IT problems. I’m not an IT guy!

Greg: Well, that’s OK.  I’m an IT guy so I can cover that part.  Hang on a second – (recalling my notes from last week – fortunately I still had my scrap of paper handy and near the top of my pile of scraps of paper with notes from countless other customer engagements) I have the Ma Bell Internet toll free number right here.

I called the number and tried to conference us all together.  The conference call didn’t work.  I told BA I needed to hang up to clear my messed up conference call and I would call him right back and try again.

We hung up and I called the toll free number again. This time it worked and I pressed the phone buttons to talk to somebody in provisioning.  Provisioning sent me to Tech Support and I talked to a helpful (finally!) lady I’ll call Ingrid.

I told Ingrid that BA was standing by onsite and we needed to configure this modem.  Ingrid said they can’t disturb technicians who are onsite working.  I said he’s waiting for us to call right now.  I put Ingrid on hold, called BA onsite and his phone immediately went to voicemail.  And an email came into my inbox from Tom – BA was on the phone with somebody else.

I came back to Ingrid and explained BA was on the phone with somebody else and his phone went to voicemail.  And then my phone beeped.  It was BA.  I put Ingrid on hold again, told BA to hang up and stand by and we would conference him in.  Back to Ingrid – but my cell phone was now tied up because BA’s inbound call tied up my conferencing capability even after BA hung up.  Ingrid would have to do it.  So Ingrid tried to call BA.  Ingrid reported BA’s phone answered and disconnected.  She tried again, same problem.

My cell phone beeped again.  It was BA, reporting that somebody tried to call him twice.  He answered but could not hear anyone so he hung up.  I could hear frustration rising in BA’s voice.  So I explained to BA that Ingrid was trying to conference us all together and to stand by.  ”Hang in there, we’ll make this happen.  I promise.”

Back to Ingrid.  I told Ingrid what BA told me and Ingrid said she would hang up with me and try to call BA and conference me in.  I told Ingrid what we needed with that modem.  Give it the block of static addresses it’s supposed to have, turn off DHCP, turn off NAT, turn off all firewall rules.  Bless her heart, Ingrid knew what I was talking about.  She repeated it and we hung up.

Whew!

My phone rang a minute later.  It was Pat from Ma Bell Internet.  Pat had BA on the line and wanted to conference me in.  I don’t know what happened to Ingrid.  I said, “yes, absolutely!”

Finally – we had all the right people together on the same call at the same time. Now we could get to work.

Meantime, my email inbox chirped with requests from the customer main office for status updates.  Like most IT people, I can type and talk at the same time.  It’s a skill we all learn sooner or later.  So I updated the customer via email while I talked to Ma Bell Internet on the phone.

Pat asked if I wanted the modem to be a bridge or router. I told her I didn’t care, as long as this site could get to the Internet.  So Pat decided to try setting it up as a bridge.  Pat talked BA through the steps and suddenly, nobody from inside or outside could ping that gateway address anymore.  Woops.

So we had to configure it as a router.  BA groaned – “so that means I have to type in that long password string again?”

“Yes”, I said. “Sorry. I wish there was another way to do it, but you’re there onsite and I need your hands and eyes.”

Pat talked Mike through the steps to reset the modem and configure it again.  And after all that – after trying to teach a telephone support technician what routers do, after a week of bungled appointments waiting for Ma Bell Internet to send somebody, after reassuring an onsite technician with a bad attitude, after juggling conference calls that refused to conference,  after all that, here was the problem. This was why the site could not route to the Internet over that modem.

BA: And I’m setting the subnet mask to 255.255.255.0, right?

Greg: (Listening passively until now) NO!

Pat: Uhm,  no, change that last octet to 248. So the subnet mask should read 255.255.255.248.

BA: Oh – I always set it to 255.255.255.0. I don’t even know what it means.

Greg: Well, it’s important to get that one right.

Pat: (Didn’t say anything.)

And viola – it all worked. I started up an SSH session into my nifty onsite firewall and was ready to thank everyone for getting this up and running when everything dropped again.

After some more troubleshooting, Pat said, “I see some upstream errors here.” I said, “I’m pinging in another window and I notice the ping times zoom up from around 60 ms to more than 200 ms when my SSH session drops.”

Pat said, “Hold on a minute, I’ll bring somebody else in.” Pat put us on hold, leaving BA and I together.

After a few seconds, BA said, “Where did she go? Didn’t she say she was getting somebody?  What’s going on here?”

“Just hang in there.  Give her a minute to come back.”

A few seconds later, a man I’ll call Galen came on the line. Galen and Pat confirmed something unusual was going on.

And then BA asked us all to wait a minute. “I’m taking the modem offline for a minute.”

Pat said, “Oh – now I don’t see the modem anymore!” I said, “yes, BA said he was taking it offline for a minute.”

A few seconds later, the modem came back online.  BA explained he pulled the telecom cabling out of the punchdown block and punched the wires down again.

And after BA punched down the wires again, the connection stayed solid.  After watching for about 2 minutes, BA said, “I’m outta here!” and left.  Galen, Pat, and I stayed together for another few minutes and then I agreed Ma Bell Internet could close this case.

Wow!

What lessons can we extract from this experience? I named the onsite technician BA because he really did have a bad attitude.  He had a bad attitude because his company is dysfunctional and he handled the stress poorly, making a touchy situation even worse.  How do I know Ma Bell Internet is dysfunctional?  Look at the evidence.  Poorly trained telephone support technicians (Misty), broken scheduling, poor communications, overburdened and under trained onsite technicians.  This is fixable if the managers at Ma Bell Internet want to fix it.  If not, plenty of other Internet providers will eat their lunch.

(First published on my Infrasupport website, May 24, 2014.  I backdated here to match the original posting date.)

Spying – The pot calling the kettle black

Sometimes when high tech meets international politics, reality really is stranger than fiction.

First, a few enlightened members of our US Congress accused Chinese telecom equipment giant, Hauwei, of spying for the Chinese government. Here is one of many press articles, this one from October, 2012.  Here is another article from 2011.  Apparently, much of the fear on this side of the Pacific about Hauwei is because Hauwei founder and CEO, Ren Zhengfei was once a telecom technician in the Chinese People’s Liberation Army.  The company CEO served in his own country’s military years ago.  Therefore, today’s Chinese government will use equipment from his company to spy on the United States.

I wonder how many American CEOs once served in the US military?  Does it follow that their companies therefore spy on China?

This article from July, 2013 might be one of the best.   Quoting the first sentence in the article:

Former Central Intelligence Agency chief Michael Hayden said that at a minimum, Huawei had provided Chinese officials with “intimate and extensive knowledge of the foreign telecommunications systems.

Farther down, we see this nugget:

Hayden currently serves on the board at Motorola Solutions, and is a principal at security consultancy Chertoff Group.

Yup, that’s the same former Homeland Security Director, Michael Chertoff, who oversaw the US Government’s not-so-brilliant response to hurricane Katrina back in 2005.  Now he runs a consulting company, advising governments and big business how to keep their infrastructure safe.  And Michael Hayden works for him.

As for Motorola Solutions, here is how that company describes itself, from its own website at http://www.motorolasolutions.com:

Motorola Solutions provides business- and mission-critical communication products and services to enterprises and governments.

I should disclose a few things before going any further with this.  First, I am an American and proud of it.  By an accident of birth, I am blessed to live in the best country in the world.  I want the United States to compete fiercely and win all the competitive battles.  I don’t like Chinese counterfeiting, I don’t like spam relayed from Chinese email relay services, and I don’t want anyone spying on me.

I like to think I’m one of the good guys.  I want my country to also be one of the good guys.

I also like level playing fields.  I regularly go up against entrenched companies – American and foreign – and it frustrates me beyond belief when I offer superior solutions but lose because the entrenched competition successfully introduces FUD with the potential customer.  Introducing FUD – Fear, Uncertainty, and Doubt – is a time honored tradition in the high tech marketplace.  The conversations start something like this:

Mr. Customer, are you sure you want to look at this new solution?  You have a lot riding on this project, and even though this new upstart might offer some advantages and they’re less expensive than we are, is it really worth the risk?  After all, we’ll be adding that capability sometime in the next 20 years so they don’t really have any advantage anyway.  Doesn’t it make more sense to stick with us and what you already know?

And bla bla bla…

FUD is often no more than a line of BS, but fear is a powerful motivator.  FUD works – that’s why entrenched incumbents use it.

So now, along comes Hauwei, a Chinese company, and the guy who sits on the board of a direct US competitor accuses Hauwei of spying for the Chinese.  And he made his accusations nearly a year after a US Presidential Commission spent 18 months investigating Hauwei and found no evidence to support the accusations.  Read the details right here.

What’s really going on here?  Hayden and his boss are spreading FUD, wrapped up in the US flag and national security.   But it’s not really about national security.  It’s about keeping a competitor out of the US marketplace.  It’s good old fashioned protectionism mixed with a 21st century high tech twist.  It was never about national security, it’s about money.

And now it gets better.

Because the NSA – the organization Hayden used to run – could not keep its own secrets, we find out the NSA hacked into the Hauwei internal network and spied on Hauwei.  That’s the pot calling the kettle black.

Instead of Hauwei spying on us, we spied on Hauwei.  And got caught.

In what universe is it possible the Chinese are the good guys in this episode?

(First published on my Infrasupport website, March 26, 2014.  I backdated here to match the original posting date.)