Finding Efficiency and Integration

Since going public in 2011, LinkedIn has become the world’s largest professional network with 300 million members in over 200 countries and territories around the globe.

IN-HOUSE DATA WAREHOUSE CAN'T KEEP UP

As a fast-growing company, LinkedIn needed to integrate multiple versions of very rapidly changing planning and forecast data with actual financial records, and they needed to be able to do this without having to refresh the enterprise data warehouse. They also needed the ability to directly enter and edit attributes of financial data without updating the data in the source systems or enterprise data warehouse. As the volume of data grew exponentially, and as the requirements of the Financial Planning and Analysis group became more complex, the in-house solution LinkedIn was relying on began to exhibit substantial performance and usability issues.

The resulting ETL solution is better able to support LinkedIn’s growing needs. It can handle the large amounts of data being processed, is well-documented, and is easier to maintain and support than the previous in-house solution.

Improving Compliance and Performance with Analytics

We were engaged first to ease LinkedIn’s pain points with their existing in-house solution, and then to provide a new solution that would be flexible enough to meet the company’s rapidly growing list of needs and enable them to scale with the growing volume of data.

As we began working on the current in-house solution, our team identified that neither the data repository design nor the in-house ETL tool were sufficient for the volume of data being processed. Once we presented LinkedIn with our analysis of its limitations, we were engaged to design a new data repository and to develop new ETL processes to populate and maintain it.

We studied the data from the perspective of the FP&A group, clarifying and documenting their unique requirements, and refining business rules for specific processes in the existing solution which they wanted to maintain. With this information in hand we engaged in dimensional modeling to design them a far more efficient data warehouse on an Oracle platform. Using Kimball methodology we were also able to ensure that all work done during this initial design could be expanded upon easily in future phases.

Because LinkedIn had some familiarity with Informatica, through OBIEE, and because Informatica excels at working with some of the many divergent data sources which LinkedIn uses, it was selected as the ETL tool. Using Informatica we then engaged in the exercise of transitioning a wealth of hard-coded, often undocumented, and sometimes manually implemented business logic into new ETL processes.

Share This

The resulting ETL solution is better able to support LinkedIn’s growing needs. It can handle the large amounts of data being processed, is well-documented, and is easier to maintain and support than the previous in-house solution.

  • We reduced data processing time from about 5 hours to about 40 minutes.
  • The new data repository is significantly more responsive, saving the company time.
  • The team now enjoys end-to end ETL automation.
  • They can easily maintain and modify business logic.
  • Both the ETL solution and the new data repository are easily expandable and scale well with increasing volumes of data as the company grows.
  • By consolidating multiple data sources into one repository, everything is more efficient.
  • LinkedIn can spend less time troubleshooting problems thanks to a more robust ETL process with session recoverability, notification, monitoring, and logging capabilities.

For LinkedIn, it's crucial to quickly keep up with the changing needs and scalability of rapidly growing volumes of data.

Related Work