Forbes.com recently featured an article on Target’s involvement with Genesys Works, an organization that partners with companies to move more students into professional careers and creating a more productive and diverse workforce in the process.
Innovation is one of those things that every leader wants but has a difficult time implementing due to competing priorities for time and resources. The benefits of innovation are clear: It can drastically improve a team’s output, it gives teams more tools and ideas to solve problems, and in some cases, innovation can lead to culture and technology shifts that can impact an entire company or industry.
There are few best practices for data science that are as widely publicized and agreed on as there are for software development. This has led to a disconnect between product teams and the ambiguity associated with data science. Due to this disconnect, product teams are finding it difficult to implement solutions in a timely fashion.
At the beginning of 2017…
Early last month, as a part of the quarterly Enterprise Architecture Workshop, the Target technical community presented the case to promote GoLang to recommended status for teams when choosing how to build their applications.
The adoption of microservice architecture and containerized workloads at Target continues to grow rapidly across our product teams. This movement has provided our security engineering organization with a number of new challenges as we try to align with modern build processes and ensure that these new dynamic/ephemeral workload structures remain secure. This shift has also provided our group with a number of opportunities to further customize our security tools and create features that provide valuable continuous and self-service capabilities that never existed before.
Target Technology Services (TTS) Operations has a massive responsibility to our guests and team members to enable great experiences when they use Target technology. Automation drives faster recovery times for incidents, lowers cost resolutions, and empowers engineers with more time to perform root cause analysis work with their solutions-level partners, driving even further reliability. The Guest Reliability Engineering Automated Service Engine GREASE is the Target implementation of automation in operations. We’ve already saved 700+ hours of our store team members’ time, and 10,000+ hours of IT operations!
REDstack is now Open Source!
We are officially open sourcing REDstack, our sandbox tool for Big Data development at Target.
K8Guard is officially open source
I am happy to announce that Target has open sourced K8Guard. I have been part of designing and developing it for the past few months, and I’m going to share a little more about it.
Here at Target, we run our own private OpenStack cloud and have never been able to accurately measure the performance of our hardware. This lack of measurement prevents the evaluation of performance improvements of new hardware or alternative technologies running as drivers inside OpenStack. It also prevents us from providing a Service Level Agreement (SLA) to our customers. Recently we have been striving to improve our OpenStack service which led us to talk to our consumers directly.
One of the strongest benefits of launching an application into the cloud is the pure on-demand scalability that it provides.
It is 7:30 AM on a Monday morning in late October. I am waiting in line at Cafe Donuts to bring my team breakfast for our mandated ‘no work for one hour’.
Hadoop upgrades over the last few years meant long outages where the Big Data platform team would shutdown the cluster, perform the upgrade, start services and then complete validation before notifying users it was ok to resume activity. This approach is a typical pattern for major upgrades even outside Target and reduces the complexity and risks associated with the upgrade. While this worked great for the platform team, it was not ideal for the hundreds of users and thousands of jobs that were dependent on the platform. That is why we decided to shake things up and go all in for rolling maintenance.
Just after the middle of last year, Target expanded beyond its on-prem infrastructure and began deploying portions of target.com to the cloud. The deployment platform was homegrown (codename
Houston), and was backed wholly by our public cloud provider. While in some aspects that platform was on par with other prominent continuous deployment offerings, the actual method of deploying code was cumbersome and not adherent to cloud best practices. These shortcomings led to a brief internal evaluation of various CI/CD platforms, which in turn led us to Spinnaker.
Target’s open source big data platform contains a vast array of clustered technologies or ecosystems working together. Troubleshooting an issue within a single ecosystem is a difficult task let alone an issue that spans several ecosystems.
Win the cloud with Winnaker!
I am happy to announce that we, at Target, decided to open source a tool called Winnaker. This tool will allow the user to audit Spinnaker from an end user point of view.
At Target we aim to make shopping more fun and relevant for our guests through extensive use of data – and believe me, we have lots of data! Tens of millions of guests and hundreds of thousands of items lead to billions of transactions and interactions. We regularly employ a number of different machine learning techniques on such large datasets for dozens of algorithms. We are constantly looking for ways to improve speed and relevance of our algorithms and one such quest brought us to carefully evaluate matrix multiplications at scale – since that forms the bedrock for most algorithms. If we make matrix multiplication more efficient, we can speed up most of our algorithms!
On my first encounter with it, around early 2010’s, I was mystified. It sounded like witchcraft and I imagined the practitioners to be a coven of witches and wizards, all holding Ph.D.s in the dark art of “Data Science” and being respectfully addressed as “Data Scientists”. It was believed they would magically transform haystacks into gold and then ask for your first-born in return as a reward for their service (a la Rumpelstiltskin) There is no denying the fact that the title “Data Scientist” is the most coveted one these days and has a nice ring to it. It’s also true that data science has traditionally been a monopoly of mathematicians and statisticians. Obviously, developing statistical models and machine learning algorithms requires years of training and practice to specialize. In my opinion it is more of an art form driven by science and can easily be mistaken for magic.
When you first think about scaling an on-premise Hadoop cluster your mind jumps to the process and the teams involved in building the servers, the time needed for configuring them and then the stability required while getting them into the cluster. Here at Target that process used to be measured in months. The story below outlines our journey around scaling our Hadoop cluster, taking the months to hours and adding hundreds of servers in a couple weeks.
An enterprise as large as Target generates a lot of data and on my Big Data platform team we want to make it as easy as possible for our users to get it into Hadoop in real-time. I want to discuss how we are starting to approach this problem, what we’ve done so far, and what is still to come.
At Target we’re always looking for ways to move forward in becoming the best omni-channel retailer that we can be. This journey demands that we enable change in most every part of our technology organization. Our culture, delivery model, technology selections, working arrangements and org structures are all levers we can pull to help us be more responsive. A key question that we continually ask ourselves is “how can we move faster”? Introducing change in a large enterprise can take a long time if we don’t challenge ourselves to be creative around constraints (like not enough “experts” to go around). Traditional learning approaches might not be fast enough, so we’ve started acting on some innovative ideas to break down those barriers.
In 2014, I worked on a project to build out a system to provide new business capabilities for our Marketing and Digital spaces. This initiative was not unlike other initiatives we undertake at Target, but it provides a great example of the importance of us investing in an effort to modernize the way we operate, the technology stacks we use and the way we think about systems and systems architecture.
One important part of a retailer is their supply chain network. In order to sell a product in our stores, or online at Target.com, we need to have a well run and maintained network to move all of those products from one place to another. It may not sound like a complex problem to ship products from a distribution center to a store, but as soon as you start to have multiple vendors with multiple stores (not to mention online orders that go directly to a guest’s home) it becomes increasingly complex. Retailers get products from companies, referred to as vendors, and then distribute the products to stores through central locations, referred to as distribution centers (DCs). In order to improve throughput of our supply chain network we expand our distribution network by building new distribution centers or we increasing the efficiency of the current network. A great way to improve performance of a DC is to put robots and other automation equipment in it. High tech equipment like this requires high skilled labor to maintain and manage the equipment. These highly automated systems run on servers and other control equipment and they produce a lot of machine data that can lead to valuable insights.
Recently we launched Project Argus, a 30-day “monitoring challenge” to improve visibility of key performance indicators (KPIs) across our technology stack prior to our peak retail season (in Greek Mythology, Argus is a 100-eyed giant). This effort was structured as a mix between a FlashBuild and agile. We used two day sprints, twice a day stand-ups, and feature tracking through Kanban boards. Quickly in this effort we decided to build our product as monitoring-as-code. Through the use of tools such as GitHub, Chef, Kitchen, Jenkins, and Ruby, we were able to quickly build several monitoring and dashboard solutions for use within our Technology Operations Center. These dashboards and the iterative process we use to continue delivering more content have been embraced by our support teams who now heavily rely on them to proactively detect and resolve issues.
At a couple of recent conferences around DevOps (MSP DevOps Days and DevOps Enterprise Summit), Heather Mickman and Ross Clanton from Target explained some high level concepts and actions we’re taking to break our mold of working on complex problems. When presented with an opportunity to work differently and challenge our normal delivery of IT assets - from plan to decommission - what would that look like?
Tuesday, October 14th 2015, Target affirmed its commitment to technology and leadership in the retail space by joining the World Wide Web Consortium (W3C). As a company that cares about delivering great experiences for customers, we realize those experiences are now steeped in technology. One way Target can have a significant impact on shaping technology is open engagement with other companies and organizations at the heart of emerging development - the W3C enables us to do just that.
This past week, Target open-sourced our first project, a Chef cookbook for Apache Cassandra. This cookbook is the exact version we use to manage our production Cassandra environment. I want to go in to some more detail about how the cookbook is used and how we automate our Cassandra deployment.
We have been using Cassandra in production for about a year now. We use it to serve up product and inventory information through our Products API. Currently, we have a 12-node Cassandra cluster. Each node is a physical server with a 24-core processor, 400gb RAM, and 12 SSD hard drives.
Welcome to the new Target Tech blog! At Target we do amazing things with technology everyday. This blog is a place for Target team members to tell their stories about how we use technology to make retail awesome.
When our stores open on Thanksgiving day, we see an API traffic spike that is 10x our normal load. To ensure our systems are up to the task of handling the burst, we need to go beyond the typical load tests run as part of our continuous integration pipeline (someday we will write about our continuous performance testing!). Black Friday traffic requires a unique load profile and it requires all of our APIs to be tested simultaneously.