The end user customer operates branch sites across the Midwest USA and uses Infrasupport firewalls to connect to the Internet and main office in the St. Paul, MN. area. This branch site is in southern Illinois and uses a nifty new twist on my network failover system.
The firewall has a wireless LAN (WiLAN) card and several wired network interface cards (NICs). The new twist – connect a wired NIC to a low cost DSL connection. When the DSL modem drops, failover to the WiLAN connection and route through a cell phone carrier. And when the DSL connection comes back, fail back to the DSL wired connection. It works – this is the lowest cost redundancy anyone can buy.
This site opened about 6 weeks ago and the DSL connection wasn’t ready. No problem. We routed via WiLAN over the cell phone carrier and waited more than a month for Ma Bell Internet to provision the DSL service.
Ma Bell Internet is a fictitious name. As are the names of everyone else in this story other than me. Other than the names, every detail is true. As one person involved said, “you can’t make this stuff up.”
The story starts last Wednesday after I did the appropriate adjustments on the firewall to accommodate the new DSL connection. Tom, the end user customer at the site, connected the patch cable from the DSL modem to the firewall and nothing worked. Our troubleshooting pointed to a misconfigured or bad DSL modem.
We logged a service call with Ma Bell Internet, which triggered a week long comedy of errors. We called on Thursday and talked to – I’ll call her Misty – from tech support. Here is a piece of our conversation:
Greg: You have a gateway at this IP Address. We can’t ping it from the site.
Misty: I don’t see why you can’t ping it. I can ping it from here.
Greg: Right – I don’t know why we can’t ping it. That’s why we’re calling. Nobody can route through that DSL modem and we can’t ping it.
Misty: I don’t understand why you can’t ping that modem. I can ping it from here, you should be able to ping it from there.
Greg: Right – we should. And if it answered it would be even better.
Misty: So how come you can’t ping it?
Greg: Routers have two sides. We’ll call them the inside and outside. You’re on the outside and you can ping it. The site is on the inside and can’t ping it. Seems to me, something is wrong with the inside of that router.
Misty: Well if you can’t ping it, there’s not much I can do.
Greg: Yes there is! It’s your router. You need to fix it.
Misty: I tried connecting to it from here remotely to check its configuration but I’m not able to.
Greg: So – doesn’t that suggest to you something is wrong?
Misty: No. We connect those the way we’re ordered to connect them. If the order said no remote diagnostics then we wouldn’t have turned that on. And I can ping it.
Greg: So did the order for this one say no remote diagnostics?
Misty: I don’t know what the order said – I don’t have it.
Greg: So how do you know the order didn’t call for remote diagnostics?
Misty: Because I’m not able to connect to it.
Greg: (Exercising patience) OK, so how do we fix this?
Misty: You can run remote diagnostics from your site. Just hook a real computer up to it and connect to its website.
Greg: The problem with that is, we can’t access the modem. If we could access the modem we wouldn’t need to call you. And the only computer I have onsite is my firewall. Everything else is thin clients.
Misty: If you’re unwilling to run remote diagnostics, I can’t do much to help you.
Greg: Yes you can. Send somebody out there to fix the modem!
Misty: Maybe you can borrow a laptop from somebody and try the diagnostics.
Greg: There are no laptops to borrow at this site. We have my firewall – with no graphics so connecting to a GUI website won’t work anyway – and some thin clients. That’s it.
Misty: I can dispatch a technician, but it will be billable.
Greg: What??? Why is it billable?
Misty: Because you’re unwilling to run remote diagnostics.
Greg: No, not unwilling. Unable. We are unable to connect to this modem. If we can’t connect to it, how do we run diagnostics on it?
Misty: If I dispatch a technician and he finds a building wiring problem or something else not from us, we’ll have to charge you.
Greg: Well of course. The wiring is a 20 foot cat5e patch cable. That’s it. That’s the building wiring. One patch cable. Send somebody out.
That was on Thursday. I asked for somebody onsite Friday. But with nobody available Friday, I had to settle for Monday. My phone rang on Friday and I confirmed it – send somebody Monday, preferably Monday morning.
Monday morning came and I learned Ma Bell sent somebody to the site on Saturday. Of course nobody was at the site on Saturday. After some Monday phone calls, we scheduled another visit for Wednesday – nobody from Ma Bell was available Tuesday. So now were were a week into this issue. After opening the support ticket on Thursday, the soonest possible resolution would not happen until next Wednesday.
Wednesday morning about 9:45 my phone rang. It was the onsite Ma Bell support technician. Let’s call him BA. You’ll see why in a minute.
BA told me the customer became angry and sent him upstairs to call me. I apologized and said there was a systemic problem at Ma Bell Internet and he was the guy onsite who had to hear it all. BA told me he also got mad at the customer and apparently started throwing boxes around the site. And that was when emails started pouring into my inbox warning me about this technician and his attitude. Apparently, BA was angry the second he walked in the door and took out his frustrations on Tom, the end user customer. Tom got tired of the abuse and sent BA upstairs to call me.
BA asked me a bunch of questions I didn’t know how to answer. He wanted to know how to configure the router password and who to call for onsite tech support. I asked BA – since he worked for Ma Bell Internet – if he shouldn’t know who to call for support? BA launched into a diatribe about his employer and his frustrations about training and scheduling and his management. Here is the portion I remember most vividly:
Greg: So how do you know these connections are good?
BA: We set them up dynamic and then connect them to the Internet. If they work, then we give them their static address and we’re done.
Greg: So how do you know they work when they’re static?
BA: Because they worked when they were dynamic.
Greg: Don’t you think you should test them when they’re static?
BA: That’s not what we do.
Greg: Maybe you should give the Ma Ball guys some feedback about…
BA: (Interrupting) They won’t listen.
Greg: How badly do they want to keep customers?
BA: They won’t listen. They’re managers and they don’t care!
Greg: OK. Well for now, we need to fix this modem. I need it to to have the static IP Address assigned to it, no NAT, no DHCP, and no firewall rules…
BA: (Interrupting again) Hold on there – Maybe those were English words but I have no idea what any of that stuff means. I’m not an IT guy. I don’t know why customers keep trying to get me to solve their IT problems. I’m not an IT guy!
Greg: Well, that’s OK. I’m an IT guy so I can cover that part. Hang on a second – (recalling my notes from last week – fortunately I still had my scrap of paper handy and near the top of my pile of scraps of paper with notes from countless other customer engagements) I have the Ma Bell Internet toll free number right here.
I called the number and tried to conference us all together. The conference call didn’t work. I told BA I needed to hang up to clear my messed up conference call and I would call him right back and try again.
We hung up and I called the toll free number again. This time it worked and I pressed the phone buttons to talk to somebody in provisioning. Provisioning sent me to Tech Support and I talked to a helpful (finally!) lady I’ll call Ingrid.
I told Ingrid that BA was standing by onsite and we needed to configure this modem. Ingrid said they can’t disturb technicians who are onsite working. I said he’s waiting for us to call right now. I put Ingrid on hold, called BA onsite and his phone immediately went to voicemail. And an email came into my inbox from Tom – BA was on the phone with somebody else.
I came back to Ingrid and explained BA was on the phone with somebody else and his phone went to voicemail. And then my phone beeped. It was BA. I put Ingrid on hold again, told BA to hang up and stand by and we would conference him in. Back to Ingrid – but my cell phone was now tied up because BA’s inbound call tied up my conferencing capability even after BA hung up. Ingrid would have to do it. So Ingrid tried to call BA. Ingrid reported BA’s phone answered and disconnected. She tried again, same problem.
My cell phone beeped again. It was BA, reporting that somebody tried to call him twice. He answered but could not hear anyone so he hung up. I could hear frustration rising in BA’s voice. So I explained to BA that Ingrid was trying to conference us all together and to stand by. ”Hang in there, we’ll make this happen. I promise.”
Back to Ingrid. I told Ingrid what BA told me and Ingrid said she would hang up with me and try to call BA and conference me in. I told Ingrid what we needed with that modem. Give it the block of static addresses it’s supposed to have, turn off DHCP, turn off NAT, turn off all firewall rules. Bless her heart, Ingrid knew what I was talking about. She repeated it and we hung up.
My phone rang a minute later. It was Pat from Ma Bell Internet. Pat had BA on the line and wanted to conference me in. I don’t know what happened to Ingrid. I said, “yes, absolutely!”
Finally – we had all the right people together on the same call at the same time. Now we could get to work.
Meantime, my email inbox chirped with requests from the customer main office for status updates. Like most IT people, I can type and talk at the same time. It’s a skill we all learn sooner or later. So I updated the customer via email while I talked to Ma Bell Internet on the phone.
Pat asked if I wanted the modem to be a bridge or router. I told her I didn’t care, as long as this site could get to the Internet. So Pat decided to try setting it up as a bridge. Pat talked BA through the steps and suddenly, nobody from inside or outside could ping that gateway address anymore. Woops.
So we had to configure it as a router. BA groaned – “so that means I have to type in that long password string again?”
“Yes”, I said. “Sorry. I wish there was another way to do it, but you’re there onsite and I need your hands and eyes.”
Pat talked Mike through the steps to reset the modem and configure it again. And after all that – after trying to teach a telephone support technician what routers do, after a week of bungled appointments waiting for Ma Bell Internet to send somebody, after reassuring an onsite technician with a bad attitude, after juggling conference calls that refused to conference, after all that, here was the problem. This was why the site could not route to the Internet over that modem.
BA: And I’m setting the subnet mask to 255.255.255.0, right?
Greg: (Listening passively until now) NO!
Pat: Uhm, no, change that last octet to 248. So the subnet mask should read 255.255.255.248.
BA: Oh – I always set it to 255.255.255.0. I don’t even know what it means.
Greg: Well, it’s important to get that one right.
Pat: (Didn’t say anything.)
And viola – it all worked. I started up an SSH session into my nifty onsite firewall and was ready to thank everyone for getting this up and running when everything dropped again.
After some more troubleshooting, Pat said, “I see some upstream errors here.” I said, “I’m pinging in another window and I notice the ping times zoom up from around 60 ms to more than 200 ms when my SSH session drops.”
Pat said, “Hold on a minute, I’ll bring somebody else in.” Pat put us on hold, leaving BA and I together.
After a few seconds, BA said, “Where did she go? Didn’t she say she was getting somebody? What’s going on here?”
“Just hang in there. Give her a minute to come back.”
A few seconds later, a man I’ll call Galen came on the line. Galen and Pat confirmed something unusual was going on.
And then BA asked us all to wait a minute. “I’m taking the modem offline for a minute.”
Pat said, “Oh – now I don’t see the modem anymore!” I said, “yes, BA said he was taking it offline for a minute.”
A few seconds later, the modem came back online. BA explained he pulled the telecom cabling out of the punchdown block and punched the wires down again.
And after BA punched down the wires again, the connection stayed solid. After watching for about 2 minutes, BA said, “I’m outta here!” and left. Galen, Pat, and I stayed together for another few minutes and then I agreed Ma Bell Internet could close this case.
What lessons can we extract from this experience? I named the onsite technician BA because he really did have a bad attitude. He had a bad attitude because his company is dysfunctional and he handled the stress poorly, making a touchy situation even worse. How do I know Ma Bell Internet is dysfunctional? Look at the evidence. Poorly trained telephone support technicians (Misty), broken scheduling, poor communications, overburdened and under trained onsite technicians. This is fixable if the managers at Ma Bell Internet want to fix it. If not, plenty of other Internet providers will eat their lunch.
(First published on my Infrasupport website, May 24, 2014. I backdated here to match the original posting date.)