Big Data – Page 2

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

January 30, 2025 by Dutd

This is a guest post co-authored by Michael Davies from Open Universities Australia. At Open Universities Australia (OUA), we empower students to explore a vast array of degrees from renowned Australian universities, all delivered through online learning. We offer students alternative pathways to achieve their educational aspirations, providing them with the flexibility and accessibility to … Read more

Hybrid big data analytics with Amazon EMR on AWS Outposts

January 29, 2025 by Dutd

Businesses require powerful and flexible tools to manage and analyze vast amounts of information. Amazon EMR has long been the leading solution for processing big data in the cloud. Amazon EMR is the industry-leading big data solution for petabyte-scale data processing, interactive analytics, and machine learning using over 20 open source frameworks such as Apache … Read more

How MuleSoft achieved cloud excellence through an event-driven Amazon Redshift lakehouse architecture

January 28, 2025 by Dutd

This post is cowritten with Sean Zou, Terry Quan and Audrey Yuan from MuleSoft. In our previous thought leadership blog post Why a Cloud Operating Model we defined a COE Framework and showed why MuleSoft implemented it and the benefits they received from it. In this post, we’ll dive into the technical implementation describing how … Read more

OpenSearch Vector Engine is now disk-optimized for low cost, accurate vector search

January 24, 2025 by Dutd

OpenSearch Vector Engine can now run vector search at a third of the cost on OpenSearch 2.17+ domains. You can now configure k-NN (vector) indexes to run on disk mode, optimizing it for memory-constrained environments, and enable low-cost, accurate vector search that responds in low hundreds of milliseconds. Disk mode provides an economical alternative to … Read more

Access Amazon S3 Iceberg tables from Databricks using AWS Glue Iceberg Rest Catalog in Amazon SageMaker Lakehouse

January 23, 2025 by Dutd

Amazon SageMaker Lakehouse enables a unified, open, and secure lakehouse platform on your existing data lakes and warehouses. Its unified data architecture supports data analysis, business intelligence, machine learning, and generative AI applications, which can now take advantage of a single authoritative copy of data. With SageMaker Lakehouse, you get the best of both worlds—the … Read more

Generate vector embeddings for your data using AWS Lambda as a processor for Amazon OpenSearch Ingestion

January 21, 2025 by Dutd

On Nov 22, 2024, Amazon OpenSearch Ingestion launched support for AWS Lambda processors. With this launch, you now have more flexibility enriching and transforming your logs, metrics, and trace data in an OpenSearch Ingestion pipeline. Some examples include using foundation models (FMs) to generate vector embeddings for your data and looking up external data sources … Read more

Automate topic provisioning and configuration using Terraform with Amazon MSK

January 16, 2025 by Dutd

As organizations deploy Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters across multiple use cases, the manual management of topic configurations can be challenging. This can lead to several issues: Inefficiency – Manual configuration is time-consuming and error-prone, especially for large deployments. Maintaining consistency across multiple configurations can be difficult. To avoid this, Kafka … Read more

How EUROGATE established a data mesh architecture using Amazon DataZone

January 15, 2025 by Dutd

This post is co-written by Dr. Leonard Heilig and Meliena Zlotos from EUROGATE. For container terminal operators, data-driven decision-making and efficient data sharing are vital to optimizing operations and boosting supply chain efficiency. Internally, making data accessible and fostering cross-departmental processing through advanced analytics and data science enhances information use and decision-making, leading to better … Read more

Juicebox recruits Amazon OpenSearch Service for improved talent search

January 15, 2025 by Dutd

This post is cowritten by Ishan Gupta, Co-Founder and Chief Technology Officer, Juicebox. Juicebox is an AI-powered talent sourcing search engine, using advanced natural language models to help recruiters identify the best candidates from a vast dataset of over 800 million profiles. At the core of this functionality is Amazon OpenSearch Service, which provides the … Read more

Batch data ingestion into Amazon OpenSearch Service using AWS Glue

January 13, 2025 by Dutd

Organizations constantly work to process and analyze vast volumes of data to derive actionable insights. Effective data ingestion and search capabilities have become essential for use cases like log analytics, application search, and enterprise search. These use cases demand a robust pipeline that can handle high data volumes and enable efficient data exploration. Apache Spark, … Read more